MY472      Half Unit
Data for Data Scientists

This information is for the 2022/23 session.

Teacher responsible

Dr Friedrich Geiecke


This course is available on the MSc in Applied Social Data Science, MSc in Human Geography and Urban Studies (Research), MSc in Media and Communications (Data and Society) and MSc in Social Research Methods. This course is available with permission as an outside option to students on other programmes where regulations permit.

This course is not controlled access. If you register for a place and meet the prerequisites, if any, you are likely to be given a place

Course content

This course will cover the principles of digital methods for storing and structuring data, including data types, relational and nonrelational database design, and query languages. Students will learn to build, populate, manipulate and query databases based on datasets relevant to their fields of interest. The course will also cover workflow management for typical data transformation and cleaning projects, frequently the starting point and most time consuming part of any data science project. This course uses a project-based learning approach towards the study of online publishing and group-based collaboration, essential ingredients of modern data science projects. The coverage of data sharing will include key skills in on-line publishing, including the elements of web design, the technical elements of web technologies and web programming, as well as the use of revision-control and group collaboration tools such as GitHub. Each student will build one or more interactive website based on content relevant to his/her domain-related interests, and will use GitHub for accessing and submitting course materials and assignments.

In this course, we introduce principles and applications of the electronic storage, structuring, manipulation, transformation, extraction, and dissemination of data. This includes data types, database design, data base implementation, and data analysis through structured queries. Through joining operations, we will also cover the challenges of data linkage and how to combine datasets from different sources. We begin by discussing concepts in fundamental data types, and how data is stored and recorded electronically. We will cover database design, especially relational databases, using substantive examples across a variety of fields. Students are introduced to SQL through MySQL, and programming assignments in this unit of the course will be designed to insure that students learn to create, populate and query an SQL database. We will introduce NoSQL using MongoDB and the JSON data format for comparison. For both types of database, students will be encouraged to work with data relevant to their own interests as they learn to create, populate and query data. In the final section of the data section of the course, we will step through a complete workflow including data cleaning and transformation, illustrating many of the practical challenges faced at the outset of any data analysis or data science project.

Online publishing and collaboration tools forms the second part of this course, along with the tools and technologies that underlie them. Students will develop interactive, secure and powerful projects for the World Wide Web using both client and server side technologies. Collaboration and the dissemination and submission of course assignments will use GitHub, the popular code repository and version control system. The course begins with an indepth look at the markup languages that form the foundations of building web sites with a study of HTML and CSS. Students next study basic programming in JavaScript, to provide client and server side tools including the customization of web content using Bootstrap and Jekyll to publish web pages, which will provide the basis for a class project.


This course is delivered through a combination of classes and lectures totalling a minimum of 20 hours across Michaelmas Term.

This course has a reading week in Week 6 of MT.

Formative coursework

Students will be expected to produce 10 problem sets in the MT.

Students will work on weekly, structured problem sets in the staff-led class sessions. Example solutions will be provided at the end of each week.

Indicative reading

  • Chodorow, Kristina MongoDB: The Definitive Guide, 2nd Edition O’Reilly 2013.
  • Churcher, Clare. Beginning Database Design: From Novice to Professional. Apress, 2007.
  • Tahaghoghi, Seyed M. and Hugh E. Williams. Learning MySQL. O'Reilly, 2006. Karumanchi, Narasimha. Data Structures and Algorithms Made Easy: Data Structure and Algorithmic Puzzles, Second Edition. CreateSpace Independent Publishing Platform, 2011.
  • Lee, Kent. Data Structures and Algorithms with Python. Springer, 2015.
  • Lake, Peter. Concise Guide to Databases: A Practical Introduction. Springer, 2013.
  • Nield, Thomas. Getting Started with SQL: A hands-on approach for beginners. O’Reilly, 2016.
  • Byron, Angela and Addison Berry, Nathan Haug, Jeff Eaton, James Walker, Jeff Robbins Using Drupal: Choosing and Configuring Modules to Build Dynamic Websites. O'Reilly Media, 2008.
  • Duckett, Jon HTML and CSS: Design and Build Websites New York: Wiley, 2011.
  • Duckett, Jon JavaScript and JQuery: Interactive Front-End Web Development New York: Wiley, 2014.
  • Rice, Dylan. Twitter Bootstrap In Your Pocket. CreateSpace Independent Publishing Platform, 2016.
  • Sklar, David Learning PHP 5 O’Reilly, 2004. GitHub Guides at, including: “Understanding the GitHub Flow”, “Hello World”, and “Getting Started with GitHub Pages”.
  • Jacobson, Daniel APIs: A Strategy Guide O'Reilly: 2012.
  • London, Kyle Developing Large Web Applications: Producing Code That Can Grow and Thrive O'Reilly, 2010.


Take-home assessment (50%) and problem sets (50%) in the MT.

Key facts

Department: Methodology

Total students 2021/22: 46

Average class size 2021/22: 22

Controlled access 2021/22: No

Value: Half Unit

Guidelines for interpreting course guide information

Course selection videos

Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.

Personal development skills

  • Self-management
  • Team working
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Specialist skills