DS105A      Half Unit
Data for Data Science

This information is for the 2025/26 session.

Course Convenor

Dr Jon Cardoso Silva

Availability

This course is available on the Erasmus Reciprocal Programme of Study and Exchange Programme for Students from University of California, Berkeley. This course is freely available as an outside option to students on other programmes where regulations permit. It does not require permission. This course is freely available to General Course students. It does not require permission.

Students are expected to participate, in person, of a group presentation on the last week of the Autumn Term.

Requisites

Mutually exclusive courses:

This course cannot be taken with DS105W at any time on the same degree programme.

Additional requisites:

There are no pre-requisites to enter this course, just a strong willingness to learn how to code in Python.

Course content

What the course is about: This course teaches students how to transform, manipulate and analyse 'real data' through a fully hands-on, practical approach from the very beginning. Students are introduced to Python programming in a way that welcomes coding beginners, with weekly coding tasks that build foundational skills progressively. Live coding demonstrations guide students through practical exercises, which they can follow in real-time on their own devices.

The Intended Learning Outcomes of this course (what you can expect to learn) are:

  • Master the fundamentals of data types, structures, and common data formats
  • Apply Python and pandas to clean, reshape and transform raw data
  • Design and implement practical data analysis workflows
  • Use Git and GitHub for version control and collaborative workflows
  • Identify and resolve common data quality issues
  • Integrate data from multiple sources
  • Understand database normalisation concepts and basic SQL queries
  • Create visualisations using lets-plot to apply grammar-of-graphics principles
  • Critically evaluate visualisations and distinguish between correlation and causation
  • Understand the fundamentals of markup languages, including HTML, and the Markdown format for formatting documents and web pages.             
  • Create and maintain simple websites using HTML and CSS
  • Craft clear, accurate and responsible data reports

The course emphasises critical thinking in data analysis, ensuring students can not only produce visualisations but also critically evaluate them. This includes understanding the difference between correlation and causation, avoiding misleading representations, and communicating insights responsibly. The course content incorporates diverse datasets, and case studies and encourages students to apply ethical considerations in their data work, including the responsible use of generative AI tools in the data science workflow.

Notes

  • Older iterations of this course can be seen on the course's public website.
  • Please note that starting from the 2024/25 academic year, more advanced data acquisition techniques such as web scraping are covered in less depth in DS105. Students interested in these topics are encouraged to take DS205 (Advanced Data Manipulation), which explores these methods in more detail.

Teaching

20 hours of lectures and 15 hours of classes in the Autumn Term.

This course has a reading week in Week 6 of Autumn Term.

  • Students will be able to attend weekly 3-hour "drop-in sessions" to get direct support from teaching staff as they progress with their weekly formative problem sets and summative assignments.
  • Students will be able to interact more informally with the course leader, class teachers and teaching admin officer via the group chat channels created at the start of the Autumn Term. 

Formative assessment

Students will be expected to produce 2 problem sets in the AT.

Achieving proficiency in data science skills, much like programming in general, relies heavily on consistent and continuous practice. To facilitate this, we release these two structured problem sets very early in the course (around Weeks 02 & 04). These exercises are closely tied to in-class activities and follow the same submission structure as the graded problem sets that will be introduced after Reading Week.

Example exercises include navigating the computer terminal, accessing computer servers, and writing code to read and save data.

 

Indicative reading

Core Textbooks

Python programming and pandas

  • [Indicative] McKinney, W. (2022). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython (3rd ed.). O'Reilly Media. Available online
    Written by pandas' creator, this definitive guide covers modern pandas practices (2.0+) with numerous practical examples.
  • [Recommended] VanderPlas, J. (2023). Python Data Science Handbook: Essential Tools for Working with Data (2nd ed.). O'Reilly Media.
    This updated edition provides excellent coverage of the Python data science ecosystem, particularly strong on data manipulation with pandas and visualization with Matplotlib.
  • [Recommended] Janssens, J. (2021). Data Science at the Command Line (2nd ed.). O'Reilly Media. Available online.
    Offers a unique command-line perspective on data science workflows that complements Python-based approaches, with practical tools for data manipulation and exploration.

Data visualization and critical evaluation

  • [Indicative] Wilke, C. O. (2019). Fundamentals of Data Visualization. O'Reilly Media. Available Online.
    Provides essential principles for creating accurate, effective visualizations with practical guidance on avoiding common pitfalls and misrepresentations.
  • [Recommended] Cairo, A. (2019). How Charts Lie: Getting Smarter about Visual Information. W.W. Norton & Company.
    Examines how visualizations can mislead and provides frameworks for critical evaluation, ideal for teaching students to avoid misleading representations.

Data ethics and critical thinking

  • [Indicative] Bergstrom, C. T., & West, J. D. (2020). Calling Bullshit: The Art of Skepticism in a Data-Driven World. Random House.
    Focuses on critical thinking skills for evaluating data-driven claims with particular emphasis on understanding correlation vs. causation—a core requirement of the course.

Assessment

Problem sets (60%)

Project (40%)

The problem sets involve creating computational notebooks (Jupyter or Quarto notebooks) to showcase the coding and documentation skills gained throughout the course. Problem sets typically consist of two parts, with one submission around Week 07 and another around Week 09. The group project will consist of a pitch presentation (Week 11) and a final public report in the form of a public website (Winter Term, around Week 04).


Key facts

Department: Data Science Institute

Course Study Period: Autumn Term

Unit value: Half unit

FHEQ Level: Level 4

CEFR Level: Null

Keywords: programming, data science, generative ai tools

Total students 2024/25: 100

Average class size 2024/25: 20

Capped 2024/25: No
Guidelines for interpreting course guide information

Course selection videos

Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.

Personal development skills

  • Self-management
  • Team working
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Specialist skills