MA429 Half Unit
Algorithmic Techniques for Data Mining
This information is for the 2018/19 session.
Dr László Végh
This course is available on the MSc in Applicable Mathematics, MSc in Marketing and MSc in Operations Research & Analytics. This course is available as an outside option to students on other programmes where regulations permit.
The course will be capped to 45 students.
Students are not permitted to take this course alongside ST443, Machine Learning and Data Mining.
Students must have knowledge of Statistics and the programming language R to the level of ST447, Data Analysis and Statistical Methods.
Data Mining is an interdisciplinary field developed over the last three decades. Vast quantities of data are available today in all areas of business, science, and technology. The main goal of data mining is to extract previously unknown, useful information from such massive scale data. The aim of the course is to equip the students with a theoretically founded and practically applicable knowledge of data mining. The theoretical foundations of the field come from mathematics, statistics, computer science and artificial intelligence.
The course introduces fundamental machine learning methods and algorithms for basic data analytics problems. These methods include algorithms for classification and regression problems, such as tree construction, support vector machines, nearest-neighbour methods, Bayesian networks. The course will also cover unsupervised learning methods such as association rule mining association rule mining and clustering.
The methods are illustrated on practical problems arising from various fields. The course will use data mining packages in R.
20 hours of lectures and 15 hours of seminars in the LT. 2 hours of lectures in the ST.
There will be weekly homework assignments, some of which will be submitted for formative feedback, and some for summative assessment (10% of the course mark). A mock project will be given, as preparation for the summative group project.
James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning: with Applications in R (2016)
Torgo, Data Mining with R: Learning with Case Studies (2010)
Hastie, Tibshirani, Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, Second Edition (2009)
Exam (50%, duration: 2 hours) in the summer exam period.
Project (40%) in the ST.
Coursework (10%) in the LT.
The examination is critical to assessment. In order to pass this course, students need to achieve a mark of at least 50% in the examination. A fail mark in the exam will result in an overall fail mark for the course and cannot be compensated by the mark achieved in the coursework element.
Total students 2017/18: 48
Average class size 2017/18: 24
Controlled access 2017/18: Yes
Lecture capture used 2017/18: Yes (LT)
Value: Half Unit
Personal development skills
- Team working
- Problem solving
- Application of information skills
- Application of numeracy skills
- Commercial awareness
- Specialist skills