DS202W      Half Unit
Data Science for Social Scientists

This information is for the 2023/24 session.

Teacher responsible

Dr Ghita Berrada COL.1.02


This course is compulsory on the BSc in Politics and Data Science and BSc in Psychological and Behavioural Science. This course is available as an outside option to students on other programmes where regulations permit and to General Course students.

This course cannot be taken with ST201 Statistical Models and Data Analysis.

Material from the previous year can be found on the course's dedicated public webpage: https://lse-dsi.github.io/DS202/


A-level maths or equivalent.

An important note on programming: While programming is not strictly a pre-requisite for this course, basic programming knowledge, preferably in Python or R, is highly recommended. Students should be comfortable creating and updating variables, creating simple functions, and using flow control expressions like if-else statements, for and while loops, etc. Those who are new to coding may find the course challenging, and we encourage them to consider the Winter iteration of the course, DS202W. This will provide additional time to improve their programming skills. We recommend that students with limited programming experience explore courses such as ST101, the Digital Skills Lab workshops or the self-paced pre-sessional course listed on the DS202 Moodle page.


Course content

The main goal of this course is to provide students with a hands-on introduction to the most fundamental machine learning algorithms, as well as the metrics commonly used to assess algorithmic performance and decision-making aspects in real-life scenarios. The course will be taught through a combination of staff-led lectures and classes, with a primary focus on practical applications. R will be the primary programming language, and there will be a recap of the tidyverse set of packages in the first weeks of the course. 

In terms of content, the learning objectives of this course are to:

  • Understand the fundamentals of the data science approach, with an emphasis on social scientific analysis and the study of the social, political, and economic worlds;
  • Understand how classical methods such as regression analysis or principal components analysis can be treated as machine learning approaches for prediction or for data mining;
  • Know how to fit and apply supervised machine learning models for classification and prediction;
  • Know how to evaluate and compare fitted models, and to improve model performance;
  • Use applied computer programming, including the hands-on use of programming through course exercises;
  • Apply the methods learned to real data through hands-on exercises;
  • Integrate the insights from data analytics into knowledge generation and decision-making;
  • Understand an introductory framework for working with natural language (text) data using techniques of machine learning;
  • Learn how data science methods have been applied to a particular domain of study (applications).



16 hours and 40 minutes of lectures and 13 hours and 30 minutes of classes in the WT.

This course will have a Reading Week in Week 6.

Formative coursework

Students will be expected to produce 10 problem sets in the WT.

Students will work on weekly, structured problem sets in the staff-led class sessions. Example solutions will be provided at the end of each week.

Indicative reading

  • Wickham, Hadley, and Garrett Grolemund. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 1st edition. Sebastopol [CA]: O’Reilly, 2016. Made freely available online by the author: https://r4ds.had.co.nz/.
  • Ismay, Chester, and Albert Young-Sun Kim. Statistical Inference via Data Science: A ModernDive into R and the Tidyverse. Chapman & Hall/CRC the R Series. Boca Raton: CRC Press / Taylor & Francis Group, 2020. Made freely available online by the author:  https://moderndive.com/.
  • James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning: With Applications in R. 2nd edition. Springer Texts in Statistics. New York [NY]: Springer, 2021. Made freely available online by the author:  https://www.statlearning.com/.
  • Zumel, Nina, and John Mount. Practical Data Science with R. 1st edition. Shelter Island [NY]: Manning Publications Co, 2014.
  • Kuhn, Max, and Julia Silge. Tidy Modeling with R: A Framework for Modeling in the Tidyverse. 1st edition. Beijing Boston Farnham Sebastopol Tokyo: O’Reilly, 2022. Made freely available online by the author: https://www.tmwr.org/.
  • Silge, Julia, and David Robinson. Text Mining with R: A Tidy Approach. 1st edition. Beijing [China]; Boston [MA]: O’Reilly, 2017. Made freely available online by the author: https://www.tidytextmining.com/.
  • Grimmer, Justin, Margaret E. Roberts, and Brandon M. Stewart. Text as Data: A New Framework for Machine Learning and the Social Sciences. Princeton: Princeton University Press, 2022.


Online assessment (40%, duration: 4 hours) in the Spring exam period.
Coursework (60%, 2000 words) in the WT.

Key facts

Department: Data Science Institute

Total students 2022/23: Unavailable

Average class size 2022/23: Unavailable

Capped 2022/23: No

Value: Half Unit

Guidelines for interpreting course guide information

Course selection videos

Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.

Personal development skills

  • Self-management
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Specialist skills