MA429      Half Unit
Algorithmic Techniques for Data Mining

This information is for the 2018/19 session.

Teacher responsible

Dr László Végh

Availability

This course is available on the MSc in Applicable Mathematics, MSc in Marketing and MSc in Operations Research & Analytics. This course is available as an outside option to students on other programmes where regulations permit.

The course will be capped to 45 students.

Pre-requisites

Students are not permitted to take this course alongside ST443, Machine Learning and Data Mining.

Students must have knowledge of  Statistics and the programming language R to the level of ST447, Data Analysis and Statistical Methods.

Course content

Data Mining is an interdisciplinary field developed over the last three decades. Vast quantities of data are available today in all areas of business, science, and technology. The main goal of data mining is to extract previously unknown, useful information from such massive scale data. The aim of the course is to equip the students with a theoretically founded and practically applicable knowledge of data mining. The theoretical foundations of the field come from mathematics, statistics, computer science and artificial intelligence.

The course introduces fundamental machine learning methods and algorithms for basic data analytics problems. These methods include algorithms for classification and regression problems, such as tree construction, support vector machines, nearest-neighbour methods, Bayesian networks.  The course will also cover unsupervised learning methods such as association rule mining  association rule mining and clustering.

The methods are illustrated on practical problems arising from various fields. The course will use data mining packages in R.

Teaching

20 hours of lectures and 15 hours of seminars in the LT. 2 hours of lectures in the ST.

Formative coursework

There will be weekly homework assignments, some of which will be submitted for formative feedback, and some for summative assessment (10% of the course mark).  A mock project will be given, as preparation for the summative group project. 

Indicative reading

James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning:  with Applications in R (2016)

Torgo, Data Mining with R: Learning with Case Studies (2010)

Hastie, Tibshirani, Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, Second Edition  (2009)

Assessment

Exam (50%, duration: 2 hours) in the summer exam period.
Project (40%) in the ST.
Coursework (10%) in the LT.

The examination is critical to assessment. In order to pass this course, students need to achieve a mark of at least 50% in the examination. A fail mark in the exam will result in an overall fail mark for the course and cannot be compensated by the mark achieved in the coursework element.

Key facts

Department: Mathematics

Total students 2017/18: 48

Average class size 2017/18: 24

Controlled access 2017/18: Yes

Lecture capture used 2017/18: Yes (LT)

Value: Half Unit

Guidelines for interpreting course guide information

Personal development skills

  • Self-management
  • Team working
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Commercial awareness
  • Specialist skills