MG4E1      Half Unit
Algorithmic Techniques for Data Mining

This information is for the 2016/17 session.

Teacher responsible

Dr Laszlo Vegh NAB 3.05

Availability

This course is available on the MSc in Applicable Mathematics, MSc in Management, MSc in Management (CEMS MIM), MSc in Management (MiM Exchange), MSc in Management Science (Decision Sciences) and MSc in Management Science (Operational Research). This course is available as an outside option to students on other programmes where regulations permit.

The course will be capped to 45 students.

Pre-requisites

Students are not permitted to take this course alongside ST443 Machine Learning and Data Mining.

Students must have basic knowledge of Mathematics and Statistics. The expected background in Statistics is familiarity with hypothesis testing, linear and logistic regression, to the level of MG4C5.

Course content

Data Mining is an interdisciplinary field developed over the last three decades. Vast quantities of data are available today in all areas of business, science, and technology. The main goal of data mining is to extract previously unknown, useful information from such massive scale data. The aim of the course is to equip the students with a theoretically founded and practically applicable knowledge of data mining. The theoretical foundations of the field come from statistics, computer science and artificial intelligence.

The course introduces fundamental machine learning methods and algorithms for basic data analytics problems. These methods include algorithms for tree construction and for rule generation, instance-based learning, regression methods, support vector machines, nearest-neighbour methods, Bayesian networks, website ranking, principal component analysis, association rule mining, and distance based and density based clustering.

The methods are illustrated on practical problems arising from various fields. The course also gives an introduction to the usage of the data mining software package Weka.

Teaching

20 hours of lectures and 13 hours and 30 minutes of seminars in the LT. 1 hour and 30 minutes of seminars in the ST.

A reading week will take place in W6. There will be no teaching during this week.

Formative coursework

Students will be expected to produce 1 project in the LT and 1 problem sets in the ST.

Weekly homework assignments have to be submitted, partially as formative coursework, and partially counting towards the 10% coursework mark. A mock project will be given, similar to the group project, but with the dataset provided.

Indicative reading

Main textbook:

I. H. Witten, E. Frank, M. A. Hall: Data Mining - Practical Machine Learning Tools and Techniques.

Further reading:

T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning - Data Mining, Inference and Prediction;

P. Flach: Machine Learning: The Art and Science of Algorithms that Make Sense of Data, Cambridge University Press, 2012.

Assessment

Exam (45%, duration: 2 hours) in the main exam period.
Project (45%) in the ST.
Coursework (10%) in the LT.

Key facts

Department: Management

Total students 2015/16: 60

Average class size 2015/16: 20

Controlled access 2015/16: No

Value: Half Unit

Guidelines for interpreting course guide information

Personal development skills

  • Self-management
  • Team working
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Commercial awareness
  • Specialist skills