ST310      Half Unit
Machine Learning

This information is for the 2021/22 session.

Teacher responsible

Dr Joshua Loftus

Availability

This course is available on the BSc in Actuarial Science and BSc in Mathematics, Statistics and Business. This course is available with permission as an outside option to students on other programmes where regulations permit and to General Course students.

Pre-requisites

Students must have completed Elementary Statistical Theory (ST102) and Mathematical Methods (MA100).

Familiarity with statistics to the level of ST102, and at least one of MA100, MA212, EC220, EC221, ST206, ST202, or equivalent.

Familiarity with basic computer programming in R or Python. Students who have no previous experience in R are strongly encouraged to take on an online pre-sessional R course from the Digital Skill Lab (https://moodle.lse.ac.uk/course/view.php?id=7022)

Course content

The primary focus of this course is on the core machine learning techniques in the context of high-dimensional or large datasets (i.e. big data). The first part of the course covers elementary and important statistical methods including nearest neighbours, linear regression, logistic regression, regularisation, cross-validation, and variable selection. The second part of the course deals with more advanced machine learning methods including regression and classification trees, random forests, bagging, boosting, deep neural networks, k-means clustering and hierarchical clustering. The course will also introduce causal inference motivated by analogy between double machine learning and two-stage least squares. All the topics will be delivered using illustrative real data examples. Students will also gain hands-on experience using R or Python (programming languages and software environments for data analysis, computing and visualisation).

Teaching

15 hours of lectures, 20 hours of seminars and 5 hours of help sessions in the MT.

This course will be delivered through a combination of classes, lectures & Q&A sessions totalling a minimum of 30 hours in Michaelmas Term. This year, some of this teaching may be delivered through a combination of virtual classes and flipped-lectures delivered as short online videos. This course includes a reading week in Week 6 of Michaelmas Term

Students are required to install R/Python in their own laptops. 

Student not having a laptop of their own, will be offered to use personal computers available in seminar rooms.

Week 6 will be used as a reading week.

Formative coursework

Students will be expected to produce 5 problem sets in the MT.

Indicative reading

James, G., Witten, D., Hastie, T. and Tibshirani, R. An Introduction to Statistical Learning with Applications in R. Springer, 2017.

Hastie, T., Tibshirani, R. and Friedman, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2nd Edition, Springer,  2009. 

Efron, B. and Hastie, T. Computer Age Statistical Inference. Cambridge University Press, 2016.

Wickham, H, and Grolemund, G. (2017). R for Data Science. O'Reilly.

Assessment

Exam (70%, duration: 2 hours) in the summer exam period.
Project (30%) in the LT Week 3.

Students are required to submit a group project by applying machine leanring methods covered in this course on some real data examples using R/Python (which accounts for 30% of the final assessment).

In addition to some real data examples, the focus of this course is to introduce some theoretical and methodological concepts in machine learning. These components will be tested by a written exam (which accounts for 70% of the final assessment).

Key facts

Department: Statistics

Total students 2020/21: 61

Average class size 2020/21: 31

Capped 2020/21: No

Value: Half Unit

Guidelines for interpreting course guide information

Personal development skills

  • Self-management
  • Team working
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Specialist skills