ST115      Half Unit
Managing and Visualising Data

This information is for the 2022/23 session.

Teacher responsible

Dr Mona Azadkia

Availability

This course is compulsory on the BSc in Data Science. This course is available on the BSc in Actuarial Science. This course is available with permission as an outside option to students on other programmes where regulations permit and to General Course students.

This course has a limited number of places (it is capped). Students who have this course as a compulsory course are guaranteed a place. Places for all other students are allocated on a first come first served basis.

Pre-requisites

Students who have no previous experience in Python are required to take an online pre-sessional Python course from the Digital Skills Lab (https://moodle.lse.ac.uk/course/view.php?id=7696).

Although not a formal requirement, it is preferable that students have some familiarity with the basic concepts of probability and statistics, to the level of ST102/ST107 first 2 chapters (Data visualisation and descriptive statistics and probability theory).

Course content

The course focuses on the fundamental principles of effective manipulation and visualisation of data. This will cover the key steps of a data analytics pipeline, starting with formulation of a data science problem, going through manipulation and visualisation of data, and, finally, creating actionable insights. The topics covered include methods for data cleaning and transformation, manipulation of data using tabular data structures, relational database models, structured query languages (e.g. SQL), processing of various human-readable data formats (e.g. JSON and XML), data visualisation methods for explanatory data analysis, using various statistical plots such as histograms and boxplots, data visualisation plots for time series data, multivariate data, graph data visualisation methods.

The course will cover basic concepts and principles and will enable students to gain hands-on experience in using Python programming for manipulation and visualisation of data. This will include use of standard modules and libraries such as NumPy, Pandas, Matplotlib and Seaborn, and programming environments such as Jupyter notebook.

The course will use examples drawn from a wide range of applications such as those that arise in online services, social media, social networks, finance, and machine learning. The principles and methods learned will enable students to effectively derive insights from data and communicate results to end users.

Teaching

20 hours of lectures and 15 hours of seminars in the LT.

This course will be delivered through a combination of classes, lectures and Q&A sessions totalling a minimum of 35 hours in Lent Term. This course includes a reading week in Week 6 of Lent Term.

Students are required to install Python on their own laptops and use their own laptops in the lectures and classes.

Formative coursework

Students will be expected to produce 8 exercises in the LT.

Weekly exercises will be given, using Python and various libraries to apply various data manipulation and visualisation methods to data.  

Indicative reading


Essential Reading:

  1. W. Mckinney, Python for Data Analysis, 2nd Edition, O’Reilly 2017
  2. A. C. Muller and S. Guido, Introduction to Machine Learning with Python, O’Reilly, 2016
  3. Easley, David, and Jon Kleinberg. Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge university press, 2010
  4. R. Ramakrishnan and J. Gehrke, Database Management Systems, 3rd Edition, McGraw Hill, 2002

Additional Reading: 

  1. NumPy, https://numpy.org/
  2. Python Data Analysis Library, https://pandas.pydata.org/
  3. Matplotlib, https://matplotlib.org
  4. Seaborn: statistical data visualization https://seaborn.pydata.org
  5. NetworkX: Software for complex networks, https://networkx.org

Assessment

Coursework (30%) and project (70%) in the LT.

Students are required to hand in solutions to 2 sets of exercises using Python each accounting for 15% of the final assessment, and hand in a report for a project (accounting for 70% of the final assessment). The project consists of applying data manipulation and visualisation methods to some dataset(s).

Key facts

Department: Statistics

Total students 2021/22: 56

Average class size 2021/22: 29

Capped 2021/22: Yes (60)

Value: Half Unit

Guidelines for interpreting course guide information

Course selection videos

Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.

Personal development skills

  • Self-management
  • Team working
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Specialist skills