ST505      Half Unit
Statistical Modeling and Data Analysis

This information is for the 2020/21 session.

Teacher responsible

Dr Yunxiao Chen


This course is available on the MPhil/PhD in Statistics. This course is not available as an outside option.

In the first year, this course will be open only to MPhil/PhD in Statistics. In the future, it may be made available as an outside option to students on other programmes where regulations permit. 


A knowledge of probability and statistical theory to the level of ST102 and ST206, and linear regression to the level of ST211. Some experience with R or other statistical software or programming languages (e.g., Python, Matlab) will be assumed.


Course content

This course provides an overview of modern applied statistics. It will cover an introduction to quantitative research design, exploratory data analysis and data visualisation, generalised linear models, and generalised latent variable models (including mixed effects or multilevel models, longitudinal data analysis, and structural equation models). The course will have an applied emphasis with students gaining hands-on experience in data analysis using R and practice in the interpretation of results. 


This course will be delivered through a combination of classes and lectures totalling a minimum of 30 hours across Michaelmas Term. This year, some or all of this teaching may be delivered through a combination of virtual classes and flipped-lectures delivered as short online videos. This course includes a reading week in Week 6 of Michaelmas Term.

Formative coursework

Students will be expected to produce 1 project in the MT.

Students will be given a real dataset and asked to analyse the data to answer scientific questions and then write a report. Students' reports will be marked and feedback will be given.

Indicative reading

Maindonald, J., & Braun, J. (2006). Data analysis and graphics using R: an example-based approach. Cambridge University Press

Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.

Skrondal, A. and Rabe-Hesketh (2004)  Generalized latent variable modeling : multilevel, longitudinal, and structural equation models. Chapman & Hall/CRC

Imbens, G. W. and Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences : An Introduction

Cambridge University Press


Project (30%, 1000 words) in the MT.
Take-home assessment (70%) in the LT.

The summative assessment will be based on one piece of coursework (30%) and one take-home exam (70%). For the coursework, students will be given a dataset in week 6 and asked to analyse the data to answer several scientific questions and submit a report in week 10.  The take-home exam will be in January. The take-home exam should be no fewer than 3000 words and students will be asked to submit this within three days.

Important information in response to COVID-19

Please note that during 2020/21 academic year some variation to teaching and learning activities may be required to respond to changes in public health advice and/or to account for the situation of students in attendance on campus and those studying online during the early part of the academic year. For assessment, this may involve changes to mode of delivery and/or the format or weighting of assessments. Changes will only be made if required and students will be notified about any changes to teaching or assessment plans at the earliest opportunity.

Key facts

Department: Statistics

Total students 2019/20: 3

Average class size 2019/20: 2

Value: Half Unit

Guidelines for interpreting course guide information

Personal development skills

  • Self-management
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Specialist skills