DS202 Half Unit
Data Science for Social Scientists
This information is for the 2021/22 session.
Prof Kenneth Benoit PEL.4.01C
This course is compulsory on the BSc in Psychological and Behavioural Science. This course is available as an outside option to students on other programmes where regulations permit and to General Course students.
This course cannot be taken with ST201 Statistical Models and Data Analysis.
A-level maths or equivalent.
Data science and machine learning are exciting new areas that combine scientific inquiry, statistical knowledge, substantive expertise, and computer programming. One of the main challenges for businesses and policy makers when using big data is to find people with the appropriate skills. Good data science requires experts that combine substantive knowledge with data analytical skills, which makes it a prime area for social scientists with an interest in quantitative methods.
This module extends the foundation of probability and statistics with an introduction to the most important concepts in data science and applied machine learning, with social science examples.
It will cover the main analytical methods from this field with hands-on applications using example datasets, so that students gain experience with and confidence in using the methods we cover. At the end of this module, students will have a sound understanding of the field of data science, the ability to analyse data using some of its main methods, and a solid foundation for more advanced or more specialised study.
The topics covered include:
- the fundamentals of the data science approach, with an emphasis on social scientific analysis and the study of the social, political, and economic worlds;
- a survey of the methods of statistical learning, and its link to more classical methods of probability and statistical inference;
- an introduction to machine learning, including common supervised and unsupervised methods;
- methods of evaluating and improving model performance;
- computer programming, including the hands-on use of programming through course exercises;
- applications to real data through hands-on exercises;
- how to integrate the insights from data analytics into knowledge generation and decision-making;
- an introduction to natural language processing and text analysis;
- data visualisation through a variety of graphs.
The applications are drawn from social, political, economic, legal, and business and marketing fields.
The final week of the module will cover several applications of data science for a specific discipline, designed to link the module to specific different groups of students.
16 hours and 40 minutes of lectures and 13 hours and 30 minutes of classes in the MT.
A combination of classes and lectures totalling 33.5 hours across Lent Term (counting the 50 mins as 1 hour).
Reading week in Week 6.
Students will be expected to produce 10 problem sets in the MT.
Students will work on weekly, structured problem sets in the staff-led class sessions. Example solutions will be provided at the end of each week.
James et al. (2013) An Introduction to Statistical Learning: With applications in R. Springer.
Garrett Grolemund and Hadley Wickham (2016) R for Data Science. O'Reilly Media.
Murrell, P. (2018). R graphics. CRC Press.
Benoit, Kenneth. (2020) “Text as Data: An Overview.” In Curini, Luigi and Robert Franzese, eds. Handbook of Research Methods in Political Science and International Relations. Thousand Oaks: Sage. pp461-497.
Exam (40%, duration: 3 hours) in the January exam period.
Coursework (60%, 2000 words) in the MT.
Department: Data Science Institute
Total students 2020/21: Unavailable
Average class size 2020/21: Unavailable
Capped 2020/21: No
Value: Half Unit
Personal development skills
- Problem solving
- Application of information skills
- Application of numeracy skills
- Specialist skills