MA429      Half Unit
Algorithmic Techniques for Data Mining

This information is for the 2022/23 session.

Teacher responsible

Dr Neil Olver

Availability

This course is available on the MSc in Applicable Mathematics, MSc in Marketing and MSc in Operations Research & Analytics. This course is available as an outside option to students on other programmes where regulations permit.

This course has a limited number of places (it is controlled access). Priority is given to MSc students in the Department of Mathematics

Pre-requisites

Students are not permitted to take this course alongside ST443, Machine Learning and Data Mining.

Students must have knowledge of  Statistics and the programming language R to the level of ST447, Data Analysis and Statistical Methods.

Course content

Data Mining is an interdisciplinary field developed over the last three decades. Vast quantities of data are available today in all areas of business, science, and technology as well as social networks. The goal of data mining is to extract useful information from massive-scale data. The aim of the course is to equip students with a theoretically founded and practically applicable knowledge of data mining. The theoretical foundations of the field come from mathematics, statistics, computer science and artificial intelligence.

The course introduces fundamental machine learning methods for basic data analytics problems. For classification and regression problems, these methods include naive Bayes, K-nearest neighbours, tree and forest construction, support vector machines, and neural networks. The course will also cover unsupervised learning methods such as clustering. The ethics of data mining is discussed, from data collection through applications.

The methods are illustrated on practical problems arising from various fields. The course uses data mining packages in R.

Teaching

This course is delivered through a combination of seminars and lectures totalling a minimum of 30 hours across Lent Term.

Formative coursework

There will be a formative group project, in preparation for a similar summative project.

Indicative reading

  • James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning: with Applications in R (2016)
  • Hastie, Tibshirani, Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed. (2009)
  • Witten, Frank, Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd or 4th ed. (2016)
  • Torgo, Data Mining with R: Learning with Case Studies (2010)

Assessment

Exam (60%, duration: 2 hours) in the summer exam period.
Project (40%) in the ST.

The examination is critical to assessment. In order to pass this course, students need to achieve a mark of at least 50% in the examination. A fail mark in the exam will result in an overall fail mark for the course: it cannot be compensated by the marks in the other elements.

Key facts

Department: Mathematics

Total students 2021/22: 34

Average class size 2021/22: 11

Controlled access 2021/22: Yes

Value: Half Unit

Guidelines for interpreting course guide information

Course selection videos

Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.

Personal development skills

  • Self-management
  • Team working
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Commercial awareness
  • Specialist skills