ST310      Half Unit
Machine Learning

This information is for the 2023/24 session.

Teacher responsible

Dr Joshua Loftus

Availability

This course is compulsory on the BSc in Data Science. This course is available on the BSc in Actuarial Science, BSc in Finance, BSc in Mathematics with Data Science, BSc in Mathematics with Economics, BSc in Mathematics, Statistics and Business and BSc in Politics and Data Science. This course is available as an outside option to students on other programmes where regulations permit. This course is available with permission to General Course students.

This course cannot be taken with ST309 Elementary Data Analytics.

Pre-requisites

Students must have completed either ST102, or ST109 and EC1C1, as well as a second-year course covering regression analysis.

Previous programming experience is not required but students who have no previous experience in R must complete an online pre-sessional R course from the Digital Skills Lab before the start of the course (https://moodle.lse.ac.uk/course/view.php?id=7745)

Course content

The primary focus of this course is on the core machine learning techniques in the context of high-dimensional or large datasets (i.e. big data). The first part of the course covers elementary and important statistical methods including nearest neighbours, linear regression, logistic regression, regularisation, cross-validation, and variable selection. The second part of the course deals with more advanced machine learning methods including regression and classification trees, random forests, bagging, boosting, deep neural networks, k-means clustering and hierarchical clustering. The course will also introduce causal inference motivated by analogy between double machine learning and two-stage least squares. All the topics will be delivered using illustrative real data examples. Students will also gain hands-on experience using R or Python (programming languages and software environments for data analysis, computing and visualisation).

Teaching

12 hours and 30 minutes of lectures, 16 hours and 40 minutes of seminars and 3 hours and 20 minutes of help sessions in the AT.

This course will be delivered through a combination of classes, lectures and Q&A sessions totalling a minimum of 30 hours in Autumn Term.

[This course includes a reading week in Week 6 of AutumnTerm].

Students are required to install R/RStudio in their own laptops. 

Student not having a laptop of their own will be offered to use personal computers available in seminar rooms.

Formative coursework

Students will be expected to produce 4 problem sets in the AT.

Indicative reading

  • James, G., Witten, D., Hastie, T. and Tibshirani, R. An Introduction to Statistical Learning with Applications in R. Springer, 2017.
  • Hastie, T., Tibshirani, R. and Friedman, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2nd Edition, Springer,  2009. 
  • Efron, B. and Hastie, T. Computer Age Statistical Inference. Cambridge University Press, 2016.
  • Wickham, H, and Grolemund, G. (2017). R for Data Science. O'Reilly.

Assessment

Coursework (15%) in the AT Week 5.
Coursework (15%) in the AT Week 10.
Project (40%) and group project (30%) in the WT.

Students are required to submit a group project by applying machine learning methods covered in this course on some real data using R (which accounts for 30% of the final assessment), and an individual project that includes a prediction competition component (which accounts for 40% of the final assessment).

In addition to some real data examples, the focus of this course is to introduce some theoretical and methodological concepts in machine learning. These components will be tested by coursework as problem sets (which account for 30% of the final assessment).

Key facts

Department: Statistics

Total students 2022/23: 72

Average class size 2022/23: 24

Capped 2022/23: Yes (75)

Value: Half Unit

Guidelines for interpreting course guide information

Course selection videos

Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.

Personal development skills

  • Self-management
  • Team working
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Specialist skills