Programmes

Data Science and Big Data Analytics: An Introduction

  • Methods Summer Programme
  • Department of Methodology
  • Application code ME414
  • Starting 2017
  • Short course: Open
  • Location: Houghton Street, London

 

Data Science and Big Data Analytics are exciting new areas that combine scientific inquiry, statistical knowledge, substantive expertise, and computer programming. One of the main challenges for businesses and policy makers when using big data is to find people with the appropriate skills. Good data science requires experts that combine substantive knowledge with data analytical skills, which makes it a prime area for social scientists with an interest in quantitative methods.

This course integrates prior training in quantitative methods (statistics) and coding with substantive expertise and introduces the fundamental concepts and techniques of Data Science and Big Data Analytics.

Typical students will be Masters and PhD students from any field requiring the fundamentals of data science or working with typically large datasets and databases. Practitioners from industry, government, or research organisations with some basic training in quantitative analysis or computer programming are also welcome. Because this course surveys diverse techniques and methods, it makes an ideal foundation for more advanced or more specific training. Our applications are drawn from social, political, economic, legal, and business and marketing fields, rather than engineering or other sciences. 


 

Dates
14 - 25 August 2017

Teaching faculty
Professor Slava Mikhaylov, University of Essex
Dr Jack Blumenau, Department of Methodology
Professor Kenneth Benoit, Department of Methodology

2017 Tuition fees
Student rate: £1,495
Academic staff/charity rate: £2,230
Professional rate: £2,800

Programme details

Key facts

Dates
14 - 25 August 2017

Format
Lectures, practical classes

Assessment
Problem sets
Take home assignment (optional)

Location
LSE's Central London Campus

 

Prerequisites

Students should already be familiar with quantitative methods at an introductory level, up to linear regression analysis. Familiarity with computer programming or database structures is a benefit, but not formally required.

Course outline

This course aims to provide an introduction to the data science approach to the quantitative analysis of data using the methods of statistical learning, an approach blending classical statistical methods with recent advances in computational and machine learning. We will cover the main analytical methods from this field with hands-on applications using example datasets, so that students gain experience with and confidence in using the methods we cover. We also cover data preparation and processing, key-value formatted data (JSON), and unstructured textual data. At the end of this course students will have a sound understanding of the field of data science, the ability to analyse data using some of its main methods, and a solid foundation for more advanced or more specialised study.

The course will be delivered as a series of morning lectures, followed by lab sessions in the afternoon where students will apply the lessons in a series of instructor-guided exercises using data provided as part of the exercises.

The course will cover the following topics:

  • an overview of data science and the challenge of working with big data using statistical methods
  • how to integrate the insights from data analytics into knowledge generation and decision-making
  • how to acquire data, both structured and unstructured, and to process it, store it, and convert it into a format suitable for analysis
  • the basics of statistical inference including probability and probability distributions, modelling, experimental design
  • an overview of classification methods and related methods for assessing model fit and cross-validating predictive models
  • supervised learning approaches, including linear and logistic regression, decision trees, and naïve Bayes
  • unsupervised learning approaches, including clustering, association rules, and principal components analysis
  • quantitative methods of text analysis, including mining social media and other online resources
  • social network analysis, covering the basics of social graph data and analysing social networks
  • data visualisation through a variety of graphs.

Main texts
James et al. (2013) An Introduction to Statistical Leaning: With applications in R . Springer.
Zumel, N. and Mount, J. (2014). Practical Data Science with R. Manning Publications.

The following are supplemental texts which you may also find useful:
Lantz, B. (2013). Machine Learning with R. Packt Publishing.
Conway, D. and White, J. (2012) Machine Learning for Hackers . O'Reilly Media.
Leskovec, J., Rajaraman, A. and Ullman, J. (2011). Mining of Massive Datasets . Cambridge University Press.
Zafarani, R., Abbasi, M. A. and Liu, H. (2014) Social Media Mining: An introduction . Cambridge University Press.

Software used
R.

Schedule

Please note: A full timetable will be provided at registration on Monday 14 August. The below schedule is subject to change.

Week one (hours)

 

 Morning lecture

 Afternoon class

Mon

 3 hours

 1.5 hours

Tues

3 hours

1.5 hours

Weds

3 hours

1.5 hours

Thurs

3 hours

1.5 hours

Fri

3 hours

1.5 hours

 

Week two (hours)

 

Morning lecture 

 Afternoon class

Mon

3 hours

 1.5 hours

Tues

3 hours

1.5 hours

Weds

3 hours

1.5 hours

Thurs

3 hours

1.5 hours

Fri

3 hours

 Exam



Course benefits

This course provides participants with:

  • an understanding of the structure of datasets and databases, including "big data"
  • the ability to work with datasets and databases
  • an introduction to programming languages and basic skills in the R statistical program
  • the ability to analyse data using statistical and machine learning methods.

Faculty

Slava Mikhaylov is Professor of Public Policy and Data Science at the Institute of Analytics and Data Science and Department of Government, University of Essex. He's a Chief Scientific Advisor to Essex County Council and a co-investigator in an ESRC Big Data infrastructure investment initiative – Consumer Data Research Centre at UCL. His research and teaching is primarily in the field of computational social science and data science. 

Dr Jack Blumenau is an ESRC "Future Research Leader" post-doc in the Methodology Department at the LSE. Previously he was a fellow and PhD student in the Government Department at the LSE. His PhD thesis explored the effects of legislative leaders on the behaviour of parliamentarians in the European Parliament and the UK House of Commons.

Kenneth Benoit is Professor of Quantitative Social Research Methods at the Department of Methodology, LSE. With a background in political science, his substantive work focuses on political party competition, political measurement issues, and electoral systems. His research and teaching is primarily in the field of social science statistical applications. His recent work concerns the quantitative analysis of text as data, for which he has developed a package for the R statistical software.

Testimonials

Awesome two weeks! Very good coverage of the various statistical models blended with implementing them in R.
2016 Participant

Great course for those are new to this topic both statistical research and R programming.
2016 Participant

I had a super time learning here in London with students from all over the world. Awesome course taught by awesome people!!
2016 Participant

Request a prospectus

  • Name
  • Address

Register your interest

  • Name

Speak to Admissions

Content to be supplied