DS202W Half Unit
Data Science for Social Scientists
This information is for the 2025/26 session.
Course Convenor
Dr Ghita Berrada
Availability
This course is compulsory on the BSc in Psychological and Behavioural Science. This course is available on the Erasmus Reciprocal Programme of Study and Exchange Programme for Students from University of California, Berkeley. This course is freely available as an outside option to students on other programmes where regulations permit. It does not require permission. This course is freely available to General Course students. It does not require permission.
This course cannot be taken with ST201 Statistical Models and Data Analysis.
Material from the previous year can be found on the course's dedicated public webpage: https://lse-dsi.github.io/DS202/
Requisites
Mutually exclusive courses:
This course cannot be taken with DS202A or ST201 at any time on the same degree programme.
Additional requisites:
A-level maths or equivalent.
An important note on programming: While programming is not strictly a pre-requisite for this course, basic programming knowledge, preferably in Python or R, is highly recommended. Students should be comfortable creating and updating variables, creating simple functions, and using flow control expressions like if-else statements, for and while loops, etc. This will provide additional time to improve their programming skills. We recommend that students with limited programming experience explore courses such as DS205, DS105A, DS105W, ST101, the Digital Skills Lab workshops or the self-paced pre-sessional course listed on the DS202 Moodle page.
Note: This iteration of the course is taught in Python.
Course content
The main goal of this course is to provide students with a hands-on introduction to the most fundamental machine learning algorithms, as well as the metrics commonly used to assess algorithmic performance and decision-making aspects in real-life scenarios. The course will be taught through a combination of staff-led lectures and classes, with a primary focus on practical applications. Python will be the primary programming language, and there will be a recap of the pandas/scikit-learn set of packages in the first weeks of the course.
In terms of content, the learning objectives of this course are to:
- Understand the fundamentals of the data science approach, with an emphasis on social scientific analysis and the study of the social, political, and economic worlds;
- Understand how classical methods such as regression analysis or principal components analysis can be treated as machine learning approaches for prediction or for data mining;
- Know how to fit and apply supervised machine learning models for classification and prediction;
- Know how to evaluate and compare fitted models, and to improve model performance;
- Use applied computer programming, including the hands-on use of programming through course exercises;
- Apply the methods learned to real data through hands-on exercises;
- Integrate the insights from data analytics into knowledge generation and decision-making;
- Understand an introductory framework for working with natural language (text) data using techniques of machine learning;
- Learn how data science methods have been applied to a particular domain of study (applications).
P.S: Note that another iteration of this course, DS202A, taught in R is offered in Autumn term. The material for both versions of the course will be made available upon registration to either version.
Teaching
15 hours of classes and 20 hours of lectures in the Winter Term.
This course has a reading week in Week 6 of Winter Term.
Formative assessment
Just like for programming, achieving proficiency in data analysis, modeling and machine learning requires constant and consistent practice. To help with this, we release a structured problem set very early in the course (around Week 04). The exercises of this problem set are closely tied to in-class activities and follow the same submission structure as the graded problem sets that will be introduced after Reading Week.
Indicative reading
- James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning: With Applications in Python. 1st edition. Springer Texts in Statistics. New York [NY]: Springer, 2023. Made freely available online by the author: https://www.statlearning.com/.
- Abu-Mostafa, Yaser S., Malik Magdon-Ismail, and Hsuan-Tien Lin. Learning from data. Vol. 4. New York: AMLBook, 2012.
- Raschka, Sebastian, and Mirjalili, Vahid. Python Machine Learning: Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow 2. United Kingdom, Packt Publishing, 2019.
- VanderPlas, Jake. Python Data Science Handbook: Essential Tools for Working with Data. 2nd edition. Bejing Boston Farnham Sebastopol Tokyo: O'Reilly, 2023. Made freely available online by the author: https://jakevdp.github.io/PythonDataScienceHandbook/.
- Grimmer, Justin, Margaret E. Roberts, and Brandon M. Stewart. Text as Data: A New Framework for Machine Learning and the Social Sciences. Princeton: Princeton University Press, 2022.
- Sarkar, Dipanjan. Text Analytics with Python: A Practitioner's Guide to Natural Language Processing. Germany, Apress, 2019.
Assessment
Problem sets (60%)
Project (40%)
This component of assessment includes an element of group work.
Key facts
Department: Data Science Institute
Course Study Period: Winter Term
Unit value: Half unit
FHEQ Level: Level 5
CEFR Level: Null
Total students 2024/25: 47
Average class size 2024/25: 12
Capped 2024/25: NoCourse selection videos
Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.
Personal development skills
- Self-management
- Team working
- Problem solving
- Application of information skills
- Communication
- Application of numeracy skills
- Specialist skills