Course details
- DepartmentData Science Institute
- Application codeSS-ME204
Apply
Applications are open
We are accepting applications. Apply early to avoid disappointment.
Overview
Data is everywhere in social science research. Whether you're analysing public policy, studying social trends, or building dashboards for your organisation, you need reliable ways to collect, clean, and organise information before you can draw meaningful conclusions.
AI tools like ChatGPT can now write code to help with many data tasks. But they can't decide which data sources are trustworthy, how to structure your database, or whether your analysis actually answers your research question. Those strategic decisions still require human judgment.
This course teaches you to build complete data systems from scratch. You'll learn to collect data automatically from websites and APIs, store it efficiently in databases, and create interactive visualisations that communicate your findings clearly. Most importantly, you'll develop the skills to stay in control of your data pipeline rather than just hoping “vibe coding” with AI tools will get it right.
The course focuses on the practical, systematic work that happens before any fancy analysis begins. In real projects, most of your time goes to getting, cleaning, and organising data rather than running statistical models or machine learning algorithms. These foundational skills let you work confidently with messy, real-world datasets.
By the end of this course, you'll have built working data systems and gained the experience to direct AI tools effectively rather than being at their mercy.
Key information
Prerequisites: We recommend that students already be familiar with computer programming at an introductory level (variables, if-else, loops, functions). We have welcomed complete beginners to this course in the past, and many have done well, but it can be a tough learning curve! We recommend focusing on Python basics if you’d like to prepare in advance. Chapters 1-5 of Automate the Boring Stuff with Python by Al Sweigart is a great starting resource, freely available online.
Level: 200 level. Read more information on levels in our FAQs
Fees: Please see Fees and payments
Lectures: 36 hours
Classes: 18 hours
Assessment: A mid-term problem set (25%) and a final project (75%).
Typical credit: 3-4 credits (US) 7.5 ECTS points (EU)
Please note: Assessment is optional but may be required for credit by your home institution. Your home institution will be able to advise how you can meet their credit requirements. For more information on exams and credit, read Teaching and assessment
Is this course right for you?
This course is ideal for students seeking practical experience building data systems from scratch, whether for social science research, professional analytics roles, or preparation for advanced data science coursework. It's particularly valuable if you want to learn how to maintain control over data quality and pipeline design in an AI-augmented world.
The course serves as a strategic bridge between introductory programming and advanced machine learning or statistical analysis. Students gain the systems thinking skills that employers expect but many university programmes skip – skills that let you collaborate effectively with engineers and build reliable data solutions.
You'll find this course relevant if you're starting an MSc or MBA programme and want foundational knowledge in data systems, or if you're entering roles where data collection and analysis will be core responsibilities.
Outcomes
Aims of this course:
Develop the skills to handle data challenges from start to finish: gathering information reliably, organising it efficiently, and analysing it systematically. You'll learn when to trust AI suggestions and when human oversight is essential.
Learning Objectives:
By the end of this course, students will be able to:
- Automated Data Collection: Construct data acquisition systems using RESTful APIs and web scraping frameworks
- Data Quality Assessment: Evaluate data reliability and implement validation procedures to identify errors and inconsistencies
- Data Transformation: Organise raw data into analysis-ready formats using systematic cleaning and reshaping workflows
- Efficient Processing: Execute vectorised operations for algorithmic efficiency when handling large datasets
- Database Design: Apply normalisation and schema design principles to create robust data storage systems
- SQL Database Management: Use relational database principles and SQL for efficient data storage and retrieval
- ETL Pipeline Implementation: Implement Extract, Transform, Load processes for systematic data movement across multiple sources
- Reproducible Workflows: Design data pipelines using version control systems to ensure analysis can be repeated and verified
- Interactive Visualisation: Create data visualisations and dashboard systems that effectively communicate findings
- Strategic AI Integration: Employ AI tools strategically, from chatbots like ChatGPT, Claude, Gemini as well as specialised AI coding tools (GitHub Copilot, Cursor), while maintaining human oversight of critical design decisions
Content
Faculty
The design of this course is guided by LSE faculty, as well as industry experts, who will share their experience and in-depth knowledge with you throughout the course.

Dr Jonathan Cardoso-Silva
Assistant Professor (Education)
Department
The Data Science Institute (DSI) forms the institutional cornerstone of data science activity at the London School of Economics and Political Science. Working alongside the academic departments across the School, the DSI's mission is to foster the study of data science and new forms of data with a focus on their social, economic, and political aspects.
The DSI aims to host, facilitate and promote research in social and economic data science through an annual programme of seminars, workshops and research projects delivered by a range of academic experts and research students.
Join our mailing list
Join our mailing list to be notified when applications open for next summer.
Apply
Applications are open
We are accepting applications. Apply early to avoid disappointment.
