MY474      Half Unit
Applied Machine Learning for Social Science

This information is for the 2020/21 session.

Teacher responsible

Dr Blake Miller COL.7.14

Availability

This course is available on the MSc in Applied Social Data Science and MSc in Social Research Methods. This course is available as an outside option to students on other programmes where regulations permit.

Priority will be given to students in the MSc in Applied Social Data Science.

Pre-requisites

Applied Regression Analysis (MY452) or equivalent is required.

Course content

Machine learning uses algorithms to find patterns in large datasets and make predictions based on them. This course will use prominent examples from social science research to cover major machine learning tasks including regression, classification, clustering, and dimensionality reduction. A particular emphasis will be placed on the ethical issues surrounding machine learning applications, including privacy, algorithmic bias, and informed consent. Lectures will use case studies to introduce specific machine learning algorithms including LASSO, ridge regression, logistic regression, k-nearest neighbour classification, decision trees, support vector machines, k-means clustering, hierarchical clustering, principal component analysis, and linear discriminant analysis. Students will learn to apply these algorithms to data and validate and evaluate models. Students will work directly with social data and analyse these data using Python or R.

Teaching

This course is delivered through a combination of classes and lectures totalling a minimum of 20 hours across Lent Term. This year, some or all of this teaching may be delivered through a combination of virtual classes and flipped-lectures delivered as short online videos.

This course has a reading week in Week 6 of LT.

Formative coursework

Students will be expected to produce 1 problem set in the LT.

One structured problem set will be provided in the first weeks of the course. Students will start the problem set in the first computer workshop session and complete it outside of class.

Indicative reading

  • Géron, A. (2017). Hands-on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O'Reilly Media, Inc.
  • Müller, A. C., & Guido, S. (2016). Introduction to Machine Learning with Python: A Guide for Data Scientists. O'Reilly Media, Inc.
  • Conway, D., & White, J. (2012). Machine Learning for Hackers. O'Reilly Media, Inc.
  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning (Vol. 112). New York: Springer.
  • Cantú, F., & Saiegh, S. M. (2011). Fraudulent democracy? An analysis of Argentina's Infamous Decade using supervised machine learning. Political Analysis, 19(4), 409-433.
  • Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated hate speech detection and the problem of offensive language. Proceedings of the Eleventh International AAAI Conference on Web and Social Media (ICWSM 2017), 512-515.
  • D'Orazio, V., Landis, S. T., Palmer, G., & Schrodt, P. (2014). Separating the wheat from the chaff: Applications of automated document classification using support vector machines. Political Analysis, 22(2), 224-242.
  • Jones, Z. M., & Lupu, Y. (2018). Is There More Violence in the Middle?. American Journal of Political Science, 62(3), 652-667.
  • Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 201218772.
  • Wang, Y., & Kosinski, M. (2018). Deep neural networks are more accurate than humans at detecting sexual orientation from facial images. Journal of Personality and Social Psychology, 114(2), 246-257.

Assessment

In-class assessment (40%) and quiz (30%) in the LT.
Report (20%) and take-home assessment (10%) in the ST.

Important information in response to COVID-19

Please note that during 2020/21 academic year some variation to teaching and learning activities may be required to respond to changes in public health advice and/or to account for the situation of students in attendance on campus and those studying online during the early part of the academic year. For assessment, this may involve changes to mode of delivery and/or the format or weighting of assessments. Changes will only be made if required and students will be notified about any changes to teaching or assessment plans at the earliest opportunity.

Key facts

Department: Methodology

Total students 2019/20: 24

Average class size 2019/20: 12

Controlled access 2019/20: Yes

Value: Half Unit

Guidelines for interpreting course guide information

Personal development skills

  • Self-management
  • Team working
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Specialist skills