DS202      Half Unit
Data Science for Social Scientists

This information is for the 2021/22 session.

Teacher responsible

Prof Kenneth Benoit PEL.4.01C


This course is compulsory on the BSc in Psychological and Behavioural Science. This course is available as an outside option to students on other programmes where regulations permit and to General Course students.

This course cannot be taken with ST201 Statistical Models and Data Analysis.


A-level maths or equivalent.

Course content

Data science and machine learning are exciting new areas that combine scientific inquiry, statistical knowledge, substantive expertise, and computer programming. One of the main challenges for businesses and policy makers when using big data is to find people with the appropriate skills. Good data science requires experts that combine substantive knowledge with data analytical skills, which makes it a prime area for social scientists with an interest in quantitative methods.

This module extends the foundation of probability and statistics with an introduction to the most important concepts in data science and applied machine learning, with social science examples.

It will cover the main analytical methods from this field with hands-on applications using example datasets, so that students gain experience with and confidence in using the methods we cover. At the end of this module, students will have a sound understanding of the field of data science, the ability to analyse data using some of its main methods, and a solid foundation for more advanced or more specialised study.

The topics covered include:

  • the fundamentals of the data science approach, with an emphasis on social scientific analysis and the study of the social, political, and economic worlds;
  • a survey of the methods of statistical learning, and its link to more classical methods of probability and statistical inference;
  • an introduction to machine learning, including common supervised and unsupervised methods;
  • methods of evaluating and improving model performance;
  • computer programming, including the hands-on use of programming through course exercises;
  • applications to real data through hands-on exercises;
  • how to integrate the insights from data analytics into knowledge generation and decision-making;
  • an introduction to natural language processing and text analysis;
  • data visualisation through a variety of graphs.

The applications are drawn from social, political, economic, legal, and business and marketing fields.

The final week of the module will cover several applications of data science for a specific discipline, designed to link the module to specific different groups of students.


16 hours and 40 minutes of lectures and 13 hours and 30 minutes of classes in the MT.

A combination of classes and lectures totalling 33.5 hours across Lent Term (counting the 50 mins as 1 hour).

Reading week in Week 6.

Formative coursework

Students will be expected to produce 10 problem sets in the MT.

Students will work on weekly, structured problem sets in the staff-led class sessions. Example solutions will be provided at the end of each week.

Indicative reading

James et al. (2013) An Introduction to Statistical Learning: With applications in R. Springer. 

Garrett Grolemund and Hadley Wickham (2016) R for Data Science. O'Reilly Media. 

Murrell, P. (2018). R graphics. CRC Press.

Benoit, Kenneth. (2020) “Text as Data: An Overview.” In Curini, Luigi and Robert Franzese, eds. Handbook of Research Methods in Political Science and International Relations. Thousand Oaks: Sage. pp461-497.



Exam (40%, duration: 3 hours) in the January exam period.
Coursework (60%, 2000 words) in the MT.

Course selection videos

Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.

Important information in response to COVID-19

Please note that during 2021/22 academic year some variation to teaching and learning activities may be required to respond to changes in public health advice and/or to account for the differing needs of students in attendance on campus and those who might be studying online. For example, this may involve changes to the mode of teaching delivery and/or the format or weighting of assessments. Changes will only be made if required and students will be notified about any changes to teaching or assessment plans at the earliest opportunity.

Key facts

Department: Data Science Institute

Total students 2020/21: Unavailable

Average class size 2020/21: Unavailable

Capped 2020/21: No

Value: Half Unit

Guidelines for interpreting course guide information

Personal development skills

  • Self-management
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Specialist skills