MY572      Half Unit
Data for Data Scientists

This information is for the 2025/26 session.

Course Convenor

Ryan Hubert

Availability

This course is freely available as an outside option to students on other programmes where regulations permit. It does not require permission.

This course is not controlled access. If you register for a place and meet the prerequisites, if any, you are likely to be given a place

Course content

This course provides a rigorous introduction to digital methods for acquiring, structuring, storing, managing, and presenting data in social science research and data science applications. Framed around the life cycle of a data-intensive project, the course guides students through the key stages of working with data—from acquisition and organisation to storage, dissemination, and presentation. For data acquisition, students learn to collect digital data through static and dynamic web scraping, APIs, and online databases. They also learn essential techniques for cleaning, manipulating, and integrating data across tabular (CSV), hierarchical (JSON, XML, RSS), and web-based (HTML) formats. The course covers principles of data storage, including encoding methods and database management using relational (e.g. SQL) and non-relational (e.g. MongoDB) systems. Students gain experience in high-quality data visualisation using R and develop skills to summarise and evaluate descriptive statistics to present data effectively. The course introduces cloud computing for scalable data processing, allowing students to handle large datasets and replicate real-world workflows. Students also learn how to use generative AI tools to enhance and streamline data science workflows. Throughout, students do extensive coding in R, with exposure to SQL and basic web scripting. They also develop proficiency in version control using Git and GitHub for collaborative work and coursework submission.

Teaching

15 hours of seminars and 20 hours of lectures in the Autumn Term.

This course has a reading week in Week 6 of Autumn Term.

Formative assessment

Students will use programming techniques taught in the course for coding-based exercises focused on real-world data challenges.

 

Indicative reading

Beaulieu, Alan. 2020. Learning SQL: Generate, Manipulate, and Retrieve Data. 3rd Ed. O’Reilly.

Healy, Kieran. 2019. Data Visualization: A Practical Introduction. Princeton University Press. https://socviz.co/.

Lazer, David and Jason Radford. 2017. “Data ex Machina: Introduction to Big Data.” Annual Review of Sociology 43: 19-39. https://doi.org/10.1146/annurev-soc-060116-053457.

Munzert, Simon, Christian Rubba, Peter Meißner, and Dominic Nyhuis. 2015. Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. Wiley.

Wickham, Hadley, Mine Çetinkaya-Rundel and Garett Grolemund. 2023. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 2nd Ed. O’Reilly. https://r4ds.hadley.nz/.

Assessment

Practical test (100%)

In-person, computer-based assessment where students apply programming and core data science skills to structured, real-world tasks. The assessment emphasises analytical reasoning, coding accuracy, and practical problem-solving relevant to academic research and professional data work. Marking of these assessments will be at a level appropriate for PhD students.


Key facts

Department: Methodology

Course Study Period: Autumn Term

Unit value: Half unit

FHEQ Level: Level 8

CEFR Level: Null

Total students 2024/25: 3

Average class size 2024/25: 1

Controlled access 2024/25: No
Guidelines for interpreting course guide information

Course selection videos

Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.

Personal development skills

  • Self-management
  • Team working
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Specialist skills