MA429      Half Unit
Algorithmic Techniques for Data Mining

This information is for the 2020/21 session.

Teacher responsible

Gregory Sorkin


This course is available on the MSc in Applicable Mathematics, MSc in Marketing and MSc in Operations Research & Analytics. This course is available as an outside option to students on other programmes where regulations permit.

The course will be capped to 45 students.


Students are not permitted to take this course alongside ST443, Machine Learning and Data Mining.

Students must have knowledge of  Statistics and the programming language R to the level of ST447, Data Analysis and Statistical Methods.

Course content

Data Mining is an interdisciplinary field developed over the last three decades. Vast quantities of data are available today in all areas of business, science, and technology as well as social networks. The goal of data mining is to extract useful information from massive-scale data. The aim of the course is to equip students with a theoretically founded and practically applicable knowledge of data mining. The theoretical foundations of the field come from mathematics, statistics, computer science and artificial intelligence.

The course introduces fundamental machine learning methods for basic data analytics problems. For classification and regression problems, these methods include naive Bayes, K-nearest neighbours, tree and forest construction, support vector machines, and neural networks. The course will also cover unsupervised learning methods such as clustering. The ethics of data mining is also discussed, from data collection through applications.

The methods are illustrated on practical problems arising from various fields. The course uses data mining packages in R.


This course is delivered through a combination of classes and lectures totalling a minimum of 30 hours across Lent Term. This year, some or all of this teaching will be delivered through a combination of virtual classes and lectures delivered as online videos.

Formative coursework

There will be weekly homework assignments, some of which will be submitted for formative feedback. A mock project will be given, as preparation for the summative group project. 

Indicative reading

  • James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning: with Applications in R (2016)
  • Hastie, Tibshirani, Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed. (2009)
  • Witten, Frank, Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd or 4th ed. (2016)
  • Torgo, Data Mining with R: Learning with Case Studies (2010)


Exam (50%, duration: 2 hours) in the summer exam period.
Project (40%) in the ST.
Coursework (10%) in the LT.

The examination is critical to assessment. In order to pass this course, students need to achieve a mark of at least 50% in the examination. A fail mark in the exam will result in an overall fail mark for the course and cannot be compensated by the mark achieved in the coursework element.

Important information in response to COVID-19

Please note that during 2020/21 academic year some variation to teaching and learning activities may be required to respond to changes in public health advice and/or to account for the situation of students in attendance on campus and those studying online during the early part of the academic year. For assessment, this may involve changes to mode of delivery and/or the format or weighting of assessments. Changes will only be made if required and students will be notified about any changes to teaching or assessment plans at the earliest opportunity.

Key facts

Department: Mathematics

Total students 2019/20: 43

Average class size 2019/20: 22

Controlled access 2019/20: Yes

Value: Half Unit

Guidelines for interpreting course guide information

Personal development skills

  • Self-management
  • Team working
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Commercial awareness
  • Specialist skills