Short course: Data Engineering Principles for the Social Sciences

Department
Data Science Institute
Application code
SS-ME204

Dates

Session oneNot running in 2026

Session twoClosed - 13 Jul 2026 - 31 Jul 2026

Session threeNot running in 2026

Apply

Applications are closed

We are not currently accepting applications for this course. Register your interest below to be notified when applications open again.

Course finder

How to apply

Overview

Data is everywhere in social science research. Whether you're analysing public policy, studying social trends, or building dashboards for your organisation, you need reliable ways to collect, clean, and organise information before you can draw meaningful conclusions.

AI tools like ChatGPT can now write code to help with many data tasks. But they can't decide which data sources are trustworthy, how to structure your database, or whether your analysis actually answers your research question. Those strategic decisions still require human judgment.

This course teaches you to build complete data systems from scratch. You'll learn to collect data automatically from websites and APIs, store it efficiently in databases, and create interactive visualisations that communicate your findings clearly. Most importantly, you'll develop the skills to stay in control of your data pipeline rather than just hoping “vibe coding” with AI tools will get it right.

The course focuses on the practical, systematic work that happens before any fancy analysis begins. In real projects, most of your time goes to getting, cleaning, and organising data rather than running statistical models or machine learning algorithms. These foundational skills let you work confidently with messy, real-world datasets.

By the end of this course, you'll have built working data systems and gained the experience to direct AI tools effectively rather than being at their mercy.

Hear from some of our alumni and discover why 99% of Summer School students would recommend us.

Key information

Prerequisites: We recommend that students already be familiar with computer programming at an introductory level (variables, if-else, loops, functions). We have welcomed complete beginners to this course in the past, and many have done well, but it can be a tough learning curve! We recommend focusing on Python basics if you’d like to prepare in advance. Chapters 1-5 of Automate the Boring Stuff with Python by Al Sweigart is a great starting resource, freely available online.

Level: 200 level. Read more information on levels in our FAQs

Fees: Please see Fees and payments

Lectures: 36 hours

Classes: 18 hours

Assessment: A mid-term problem set (25%) and a final project (75%).

Typical credit: 3-4 credits (US) 7.5 ECTS points (EU)

Please note: Assessment is optional but may be required for credit by your home institution. Your home institution will be able to advise how you can meet their credit requirements. For more information on exams and credit, read Teaching and assessment

Is this course right for you?

This course is ideal for students seeking practical experience building data systems from scratch, whether for social science research, professional analytics roles, or preparation for advanced data science coursework. It's particularly valuable if you want to learn how to maintain control over data quality and pipeline design in an AI-augmented world.

The course serves as a strategic bridge between introductory programming and advanced machine learning or statistical analysis. Students gain the systems thinking skills that employers expect but many university programmes skip – skills that let you collaborate effectively with engineers and build reliable data solutions.

You'll find this course relevant if you're starting an MSc or MBA programme and want foundational knowledge in data systems, or if you're entering roles where data collection and analysis will be core responsibilities.

Outcomes

Aims of this course:

Develop the skills to handle data challenges from start to finish: gathering information reliably, organising it efficiently, and analysing it systematically. You'll learn when to trust AI suggestions and when human oversight is essential.

Learning Objectives:

By the end of this course, students will be able to:

Automated Data Collection: Construct data acquisition systems using RESTful APIs and web scraping frameworks
Data Quality Assessment: Evaluate data reliability and implement validation procedures to identify errors and inconsistencies
Data Transformation: Organise raw data into analysis-ready formats using systematic cleaning and reshaping workflows
Efficient Processing: Execute vectorised operations for algorithmic efficiency when handling large datasets
Database Design: Apply normalisation and schema design principles to create robust data storage systems
SQL Database Management: Use relational database principles and SQL for efficient data storage and retrieval
ETL Pipeline Implementation: Implement Extract, Transform, Load processes for systematic data movement across multiple sources
Reproducible Workflows: Design data pipelines using version control systems to ensure analysis can be repeated and verified
Interactive Visualisation: Create data visualisations and dashboard systems that effectively communicate findings
Strategic AI Integration: Employ AI tools strategically, from chatbots like ChatGPT, Claude, Gemini as well as specialised AI coding tools (GitHub Copilot, Cursor), while maintaining human oversight of critical design decisions

Content

Key topics

Course structure and assessments

Reading materials

Course outline

I enjoyed that the course was practical. All of the theory we learned in lectures was then applied in classes, and the reinforcement of the ideas really helped me to learn.

Faculty

The design of this course is guided by LSE faculty, as well as industry experts, who will share their experience and in-depth knowledge with you throughout the course.

Department

The Data Science Institute (DSI) forms the institutional cornerstone of data science activity at the London School of Economics and Political Science. Working alongside the academic departments across the School, the DSI's mission is to foster the study of data science and new forms of data with a focus on their social, economic, and political aspects.

The DSI aims to host, facilitate and promote research in social and economic data science through an annual programme of seminars, workshops and research projects delivered by a range of academic experts and research students.

Apply

Applications are closed

We are not currently accepting applications for this course. Register your interest below to be notified when applications open again.

Course finder

How to apply

ME204: Data Engineering Principles for the Social Sciences

Course details

Apply

Overview

Key information

Is this course right for you?

Outcomes

Content

Faculty

Department

Apply

Related Courses

ME204: Data Engineering Principles for the Social Sciences

Course details

Apply

Overview

Key information

Is this course right for you?

Outcomes

Content

Faculty

Department

Join our mailing list

Apply

Related Courses

Course Title: ME314: Introduction to Data Science and Machine Learning

Course Title: ME315: Machine Learning in Practice