Introduction to Data Science and Machine Learning

  • Summer schools
  • Department of Methodology
  • Application code SS-ME314
  • Starting 2022
  • Short course: Open
  • Location: Houghton Street, London

Data Science and Big Data Analytics are exciting new areas that combine scientific inquiry, statistical knowledge and computer programming. Organisations are turning to customer data in order to innovate and respond quickly to shifts in the market. Meanwhile, Governments are using to data to help guide policy decisions, making this a prime area for social scientists with an interest in quantitative methods.

This course aims to provide an introduction to the quantitative analysis of data, blending classical statistical methods with recent advances in computational and machine learning. You will cover key topics such as the challenges of analysing big data using statistical methods, and how machine learning and data science can aid in knowledge generation and improve decision-making. You will also explore quantitative methods of text analysis, including mining social media and other online resources.

Engaging with leading faculty, you will cover the main analytical methods from this field and the hands-on application of these methods using example datasets. As a result, the course allows you to gain experience and confidence in using the methods covered during the course in different contexts.

Session: Two 
Dates: 11 July - 29 July 2022
Lecturers: Professor Kenneth Benoit and Dr Jack Blumenau

Programme details

Key facts

Level: 300 level. Read more information on levels in our FAQs

Fees:  Please see Fees and payments

Lectures: 36 hours 

Classes: 18 hours

Assessment*: Two take-home assessments

Typical credit**: 3-4 credits (US) 7.5 ECTS points (EU)

*Assessment is optional but may be required for credit by your home institution. Your home institution will be able to advise how you can meet their credit requirements.

For more information on exams and credit, read Teaching and assessment


Students should already be familiar with quantitative methods at an introductory level, including linear regression analysis. Familiarity with computer programming or database structures is a benefit, but not formally required.

If you are not using R, we strongly encourage you to familiarise yourself before the start of the course. This will enable you to spend less time building familiarity with the tools, and more time focussing on the methods.

Key topics

  • The challenges of analysing big data using statistical methods

  • Knowledge generation and decision-making

  • Data acquisition, processing, conversion and storage

  • Normalising data

  • Data management using SQLite

  • Statistical inference

  • Probability distributions, modelling and experimental design

  • Assessing model fit and cross-validating predictive models

  • Supervised and unsupervised learning approaches

  • Quantitative methods of text analysis

  • Social network analysis

  • Data visualisation

Programme structure and assessment

The course is delivered as a series of morning lectures, followed by lab sessions in the afternoon where students will apply the lessons in a series of instructor-guided exercises, using data provided as part of the exercises.

This course is assessed through a mid-session project (25%) and a final take-home examination (75%). There will also be daily lab exercises which will not contribute to the final grade, but will allow students to test their understanding of the content.

Course outcomes

  • Gain an introduction into the challenge of working with big data using statistical methods

  • Understand how to integrate the insights from data analytics into knowledge generation and decision-making

  • Analyse how to acquire data, both structured and unstructured;,process it, store it, and convert it into a format suitable for analysis

  • Discuss approaches to normalising data, using a database manager (SQLite) and working with SQL database queries

  • Understand the basics of statistical inference including probability and probability distributions, modelling and experimental design

  • Gain an overview of classification methods and related methods for assessing model fit and cross-validating predictive models

  • Analyse the difference between supervised and unsupervised learning approaches

  • Discuss quantitative methods of text analysis, including mining social media and other online resources

Is this course right for you?

This course is suitable if you already have prior training in quantitative methods and coding, and want to enhance this training with the fundamental concepts and techniques of Data Science and Data Analytics. It is also suitable for practitioners from industry, government, or research organisations with some basic training in quantitative analysis or computer programming.

The course surveys diverse techniques and methods, making it an ideal foundation for more advanced or specific training. If you are targeting a role in government, politics, data science, research, law, business management, consulting or marketing you should consider this course.

Your department

LSE’s Department of Methodology is an internationally recognised centre of excellence in research and teaching in the area of social science research methodology. The disciplinary backgrounds of the staff include political science, statistics, sociology, social psychology, anthropology and criminology. The Department coordinates and provides a focus for methodological activities at the School, providing methods training to students from across the School.

With the training in the core social scientific tools of analysis and research offered by the Department of Methodology, coupled with its numerous workshops in other transferable skills such as computer programming and the use of methods-related software, the Department of Methodology ensures that the School’s students and staff have the expertise and training available to maintain the School’s excellence in social scientific research. We also work closely with colleagues in the Departments of Statistics and Mathematics to cover advanced topics, including in the interdisciplinary area of social applications of data science.

Your faculty

Professor Kenneth Benoit
Professor of Computational Social Science
Department of Methodology

Dr Jack Blumenau
Department of Political Science, UCL

Reading materials

James et al. (2013) An Introduction to Statistical Learning: With applications in R . Springer.

Zumel, N. and Mount, J. (2014). Practical Data Science with R. Manning Publications.

The following are supplemental texts which you may also find useful:

Lantz, B. (2013). Machine Learning with R. Packt Publishing.

Conway, D. and White, J. (2012) Machine Learning for Hackers . O'Reilly Media.

Leskovec, J., Rajaraman, A. and Ullman, J. (2011). Mining of Massive Datasets. Cambridge University Press.

Zafarani, R., Abbasi, M. A. and Liu, H. (2014) Social Media Mining: An introduction. Cambridge University Press.

*A more detailed reading list will be supplied prior to the start of the programme

**Course content, faculty and dates may be subject to change without prior notice

Request a prospectus

  • Name
  • Address

Register your interest

  • Name

Speak to Admissions

Content to be supplied