DS202 Half Unit
Data Science for Social Scientists
This information is for the 2022/23 session.
Dr Jonathan Cardoso Silva PEL 9.01C
This course is compulsory on the BSc in Politics and Data Science and BSc in Psychological and Behavioural Science. This course is available as an outside option to students on other programmes where regulations permit and to General Course students.
This course cannot be taken with ST201 Statistical Models and Data Analysis.
A-level maths or equivalent.
Data science and machine learning are exciting new areas that combine scientific inquiry, statistical knowledge, substantive expertise, and computer programming. One of the main challenges for businesses and policy makers when using big data is to find people with the appropriate skills. Good data science requires experts that combine substantive knowledge with data analytical skills, which makes it a prime area for social scientists with an interest in quantitative methods.
This module extends the foundation of probability and statistics with an introduction to the most important concepts in applied machine learning, with social science examples.
It will cover the main analytical methods from this field with hands-on applications using example datasets, so that students gain experience with and confidence in using the methods we cover. At the end of this module, students will have a sound understanding of the field of data science, the ability to analyse data using some of its main methods, and a solid foundation for more advanced or more specialised study.
The learning objectives are to:
- Understand the fundamentals of the data science approach, with an emphasis on social scientific analysis and the study of the social, political, and economic worlds;
- Understand how classical methods such as regression analysis or principal components analysis can be treated as machine learning approaches for prediction or for data mining.
- Know how to fit and apply supervised machine learning models for classification and prediction.
- Know how to evaluate and compare fitted models, and to improve model performance.
- Use applied computer programming, including the hands-on use of programming through course exercises.
- Apply the methods learned to real data through hands-on exercises.
- Integrate the insights from data analytics into knowledge generation and decision-making;
- Understand an introductory framework for working with natural language (text) data using techniques of machine learning.
- Learn how data science methods have been applied to a particular domain of study.
16 hours and 40 minutes of lectures and 13 hours and 30 minutes of classes in the MT.
This course will have a reading week in Week 6.
Students will be expected to produce 10 problem sets in the MT.
Students will work on weekly, structured problem sets in the staff-led class sessions. Example solutions will be provided at the end of each week.
James et al. (2013) An Introduction to Statistical Learning: With applications in R. 2nd Edition. Springer.
Garrett Grolemund and Hadley Wickham (2016) R for Data Science. O'Reilly Media.
Murrell, P. (2018). R graphics. CRC Press.
Benoit, Kenneth. (2020) “Text as Data: An Overview.” In Curini, Luigi and Robert Franzese, eds. Handbook of Research Methods in Political Science and International Relations. Thousand Oaks: Sage. pp461-497.
Exam (40%, duration: 3 hours) in the January exam period.
Coursework (60%, 2000 words) in the MT.
Department: Data Science Institute
Total students 2021/22: 46
Average class size 2021/22: 16
Capped 2021/22: No
Value: Half Unit
Course selection videos
Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.
Personal development skills
- Problem solving
- Application of information skills
- Application of numeracy skills
- Specialist skills