Data Science for Public Policy

This information is for the 2023/24 session.

Teacher responsible

Casey Kearney


This course is compulsory on the MPA in Data Science for Public Policy. This course is not available as an outside option.


Students must have completed Pre-Sessional Coding and Mathematics Bootcamp (PP407).

This will ensure that students have basic fluency in Python and its main Data Science libraries. 

Course content

This course covers the theory and practice of the Data Science project lifecycle in Python for Public Policy, from problem definition and data sourcing/cleaning to exploration, visualization, and modelling. Emphasis will be placed on identifying problems that are suitable for different Data Science techniques and on good practices for managing data. Linear and logistic models, regularization techniques and basic time-series models will be covered in the MT but more advanced timeseries and ML/AI models will be left for the LT. Key concepts and ideas underlying modelling (bias vs. variance, types of error, training vs. test data) and data ethics and data science ethics will be illustrated and implemented with examples from healthcare, education, urban policy, international development, and other policy areas. By the end of the course, students will have a strong coding workflow and will be able to source and experiment with data for analysis and research, both individually and in a collaborative environment.


15 hours of lectures and 15 hours of seminars in the AT. 15 hours of lectures and 15 hours of seminars in the WT.

Formative coursework

Students will be expected to produce 2 pieces of coursework in the AT and WT.

Indicative reading

These books provide an excellent starting point and can be used as the main reference for many topics. A full reading list will be provided at the beginning of the course.

  1. Gareth James, Daniela Witten, Trevor Hastie and Rob Tibshirani (2021) - An Introduction to Statistical Learning
  2. Jeffery CC. Chen, Edward A. Rubin, Gary J. Cornwall (2021) 'Data Science for Public Policy'.
  3. Claus O. Wilke (2019) - Fundamentals of Data Visualization


Exam (40%, duration: 3 hours, reading time: 15 minutes) in the spring exam period.
Coursework (30%), policy memo (15%) and presentation (15%) in the AT and WT.

Coursework is comprised of weekly coding notebooks to be completed by the student and in-class participation. Students will also complete a policy memo, presentation and take a final exam for the course. 

Key facts

Department: School of Public Policy

Total students 2022/23: Unavailable

Average class size 2022/23: Unavailable

Controlled access 2022/23: No

Value: One Unit

Guidelines for interpreting course guide information

Course selection videos

Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.

Personal development skills

  • Leadership
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Specialist skills