ST510      Half Unit
Foundations of Machine Learning

This information is for the 2023/24 session.

Teacher responsible

Dr Chengchun Shi, COL.8.08


This course is available on the MPhil/PhD in Statistics. This course is available with permission as an outside option to students on other programmes where regulations permit.

The availability as an outside option requires a demonstration of sufficient background in mathematics and statistics and is at the discretion of the instructor.


A knowledge of probability and statistical theory to the level of ST102 and ST206 and some parts of ST505 (e.g. linear models and generalized linear models). Some experience with computer programming will be assumed (e.g., Python, R).

Course content

The goal of this course is to provide students with a training in foundations of machine learning with a focus on statistical and algorithmic aspects. Students will learn fundamental statistical principles, algorithms, and how to implement and apply machine learning algorithms using the state-of-the-art Python packages such as scikit-learn, TensorFlow, and OpenAI Gym.

The course will cover the following topics:

  1. Foundations of supervised learning: empirical risk minimisation, empirical minimisation with inductive bias, PAC learning, learning via uniform convergence
  2. Convex optimisation: convexity, Newton-Raphson, gradient descent, stochastic gradient descent (SGD), acceleration by momentum, smoothness, strong convexity, convergence rates, alternating direction method of multipliers
  3. Non-convex optimisation: EM algorithm, MCMC, variational Bayesian inference, optimisation landscape, local minima and saddle points
  4. Support vector machines: margin and hard-SVM, soft-SVM and norm regularization, optimality conditions and support vectors, implementing soft-SVM using SGD
  5. Decision trees and random forests: sample complexity, decision tree algorithms, random forests
  6. Neural networks: feedforward neural networks, expressive power of neural networks, stochastic gradient descent and backpropagation
  7. Unsupervised learning - clustering: linkage-based clustering algorithms, k-means and other cost minimisation clustering, spectral clustering, information bottleneck
  8. Unsupervised learning - dimension reduction: PCA, matrix completion, autoencoder
  9. Online learning and optimisation: online learnability, online classification, weighted majority, online convex optimization, regret minimisation
  10. Reinforcement learning: multi-armed bandit processes, reinforcement learning problem, Markov Decision Problem, reinforcement learning solution methods


This course will be delivered through a combination of classes, lectures and Q&A sessions totalling a minimum of 35 hours in Winter Term. This course includes a reading week in Week 6 of Winter Term.

Formative coursework

Students will be expected to produce 9 problem sets in the WT.

Weekly problem sets that are discussed in subsequent seminars. The coursework that will be used for summative assessment will be chosen from a subset of these problems.

Indicative reading

  1. Avrim Blum, John Hopcroft and Ravindran Kannan, Foundations of Data Science, Cambridge University Press, 2020; text here  
  2. Stephen Boyd and Lieven Vandenberghe, Convex Optimization, Cambridge University Press, 2004; text here
  3. Sebastien Bubeck, Convex optimization: algorithms and complexity, Now Publishers Inc. 2016; text here
  4. Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, The MIT Press, 2016
  5. Aston Zhang, Zack C. Lipton, Mu Li, and Alex J. Smola, Deep Dive into Deep Learning, 2020; text here
  6. Trevor Hastie, Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, Springer, 2017 
  7. Shai Shalev-Shwartz and Shai Ben-David, Understanding Machine Learning: from Theory to Algorithms, Cambridge University Press, 2014; text here
  8. Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, Second Edition, MIT Press, Cambridge, MA, 2018; text here


Project (40%, 3000 words) and take-home assessment (20%) in the WT.
Oral examination (40%) in the ST.

The summative assessment will be based on four pieces of take-home assesment assignments (20% in total; 5% each), one project assignment (40%), and one oral exam (40%).

For the take-home assesments, students will be given homework problem sets and computer programming exercises in weeks 2, 4, 7, and 9.

The project assesment will be in April. Students will be asked to submit ther project reports within one month.

Key facts

Department: Statistics

Total students 2022/23: 9

Average class size 2022/23: 9

Value: Half Unit

Guidelines for interpreting course guide information

Course selection videos

Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.

Personal development skills

  • Self-management
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Commercial awareness
  • Specialist skills