DS105L      Half Unit
Data for Data Science

This information is for the 2022/23 session.

Teacher responsible

Dr Jonathan Cardoso Silva PEL 9.01C

Availability

This course is available on the BSc in Politics and Data Science. This course is available as an outside option to students on other programmes where regulations permit and to General Course students.

This course is not capped. Any student who requests a place is likely to be given one.

Course content

This course will cover the fundamentals of data, with an aim to understanding how data is generated, how it is collected, how it must be transformed for use and storage, how it is stored, and the ways it can be retrieved and communicated. The course will also cover workflow management for typical data transformation and cleaning projects, frequently the starting point and most time consuming part of any data science project. This course uses a project-based learning approach towards the study of online publishing and group-based collaboration, essential ingredients of modern data science projects.

It introduces the principles and applications of the electronic storage, structuring, manipulation, transformation, extraction, and dissemination of data. This includes data types, how data is stored and recorded electronically, the concept and fundamentals of databases. It also covers how data is formatted and communicated. It presents basic methods for obtaining data from the Internet, including simple methods for web scraping and the use of APIs to submit queries that return structured data. Finally, it covers methods for formatting and publishing data.

Sharing and publishing data will also form a key part of this module and will include key skills in on-line publishing, including the elements of web design, the technical elements of web technologies and web programming, as well as the use of revision-control and group collaboration tools such as GitHub. Each student will build an interactive website based on content relevant to their domain-related interests, and will use GitHub for accessing and submitting course materials and assignments. The final project will involve group work to create a data-based website published on GitHub.

This module is not designed to be a hands-on introduction to the use of databases, but does introduce the concepts of databases. For more detailed learning on databases, we will encourage students to take ST207 Databases.

Teaching

16 hours and 40 minutes of lectures and 13 hours and 30 minutes of classes in the LT.

A combination of classes and lectures totalling a minimum of 33.5 hours (counting 50 mins as an hour) across Lent Term, with a reading week in Week 6.

Formative coursework

In the initial sessions, students will work on weekly, structured problem sets in the staff-led class sessions. Examples of exercises involve: setting up account, repositories and pages on GitHub, accessing terminal and computer servers.

Later on, students will be expected to work on their group projects in the staff-led class sessions.

Indicative reading

  • Duckett, Jon. HTML and CSS: Design and Build Websites. New York: Wiley, 2011.
  • Lake, Peter. Concise Guide to Databases: A Practical Introduction. Springer, 2013.
  • GitHub Guides at https://guides.github.com, including: “Understanding the GitHub Flow”, “Hello World”, and “Getting Started with GitHub Pages”.
  • Jacobson, Daniel APIs: A Strategy Guide. O'Reilly: 2012.
  • Zafarani, R., Abbasi, M. A. and Liu, H. (2014) Social Media Mining: An introduction. Cambridge University Press.
  • Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt.
  • Kitchin, R. (2014). The data revolution: Big data, open data, data infrastructures and their consequences. Sage.

Assessment

Coursework (60%, 1000 words) and group project (40%) in the ST.

Key facts

Department: Data Science Institute

Total students 2021/22: 9

Average class size 2021/22: 9

Capped 2021/22: No

Lecture capture used 2021/22: Yes (LT)

Value: Half Unit

Guidelines for interpreting course guide information

Course selection videos

Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.

Personal development skills

  • Self-management
  • Team working
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Specialist skills