MA429 Half Unit
Algorithmic Techniques for Data Mining
This information is for the 2021/22 session.
Prof Gregory Sorkin
This course is available on the MSc in Applicable Mathematics, MSc in Marketing and MSc in Operations Research & Analytics. This course is available as an outside option to students on other programmes where regulations permit.
The course will be capped to 45 students.
Students are not permitted to take this course alongside ST443, Machine Learning and Data Mining.
Students must have knowledge of Statistics and the programming language R to the level of ST447, Data Analysis and Statistical Methods.
Data Mining is an interdisciplinary field developed over the last three decades. Vast quantities of data are available today in all areas of business, science, and technology as well as social networks. The goal of data mining is to extract useful information from massive-scale data. The aim of the course is to equip students with a theoretically founded and practically applicable knowledge of data mining. The theoretical foundations of the field come from mathematics, statistics, computer science and artificial intelligence.
The course introduces fundamental machine learning methods for basic data analytics problems. For classification and regression problems, these methods include naive Bayes, K-nearest neighbours, tree and forest construction, support vector machines, and neural networks. The course will also cover unsupervised learning methods such as clustering. The ethics of data mining is discussed, from data collection through applications.
The methods are illustrated on practical problems arising from various fields. The course uses data mining packages in R.
This course is delivered through a combination of seminars and lectures totalling a minimum of 30 hours across Lent Term. This year, apart from pre-recorded lecture videos, there will be a weekly live online session of an hour. Depending on circumstances, seminars might be online.
There will be a formative group project, in preparation for a similar summative project.
- James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning: with Applications in R (2016)
- Hastie, Tibshirani, Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed. (2009)
- Witten, Frank, Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd or 4th ed. (2016)
- Torgo, Data Mining with R: Learning with Case Studies (2010)
Exam (50%, duration: 2 hours) in the summer exam period.
Project (40%) in the ST.
Continuous assessment (10%) in the LT.
The examination is critical to assessment. In order to pass this course, students need to achieve a mark of at least 50% in the examination. A fail mark in the exam will result in an overall fail mark for the course: it cannot be compensated by the marks in the other elements.
Course selection videos
Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.
Important information in response to COVID-19
Please note that during 2021/22 academic year some variation to teaching and learning activities may be required to respond to changes in public health advice and/or to account for the differing needs of students in attendance on campus and those who might be studying online. For example, this may involve changes to the mode of teaching delivery and/or the format or weighting of assessments. Changes will only be made if required and students will be notified about any changes to teaching or assessment plans at the earliest opportunity.
Total students 2020/21: 44
Average class size 2020/21: 15
Controlled access 2020/21: Yes
Value: Half Unit
Personal development skills
- Team working
- Problem solving
- Application of information skills
- Application of numeracy skills
- Commercial awareness
- Specialist skills