##
ST444 Half Unit

Computational Data Science

**This information is for the 2024/25 session.**

**Teacher responsible**

Dr Yining Chen COL 7.06

**Availability**

This course is available on the MSc in Data Science, MSc in Econometrics and Mathematical Economics, MSc in Statistics, MSc in Statistics (Financial Statistics), MSc in Statistics (Financial Statistics) (Research), MSc in Statistics (Research), MSc in Statistics (Social Statistics) and MSc in Statistics (Social Statistics) (Research). This course is available with permission as an outside option to students on other programmes where regulations permit.

**Pre-requisites**

Basic knowledge in calculus and linear algebra, as well as a first course in probability and statistics.

**Course content**

An introduction to the use of popular algorithms in statistics and data science, including (but not limit to) numerical linear algebra, optimisation, graph data and massive data processing, as well as their applications. Examples include least squares, maximum likelihood, principle component analysis, LASSO and graphical LASSO, PageRank, etc. Throughout the course, students will gain practical experience of implementing these computational methods in a programming language. Learning support will be provided for at least one programming language, such as R, Python or C++, but the choice of language supported may vary between years, depending on judged benefits to students, whether in terms of pedagogy or resulting skills. This year, the default choice is Python.

**Teaching**

This course will be delivered through a combination of classes/computer workshops/lectures/Q&A sessions totalling a minimum of 30 hours across Autumn Term. This course includes a reading week in Week 6.

Lectures will cover:

(1) **Introduction**: overview of the topics to be discussed, how numbers are presented in memory, floating point arithmetic, stability of numerical algorithms

(2) **Basic algorithms**: overview of different types of algorithms, Big-O notation, elementary complexity analysis, and their applications in data science

(3) **Tools in optimisation**: convexity, bi-section, steepest descent, Newton’s method, Quasi-Newton methods, stochastic gradient, coordinate descent, other related topics (e.g. stochastic search, ADMM)

(4) **Tools in numerical linear algebra**: Gaussian elimination, Cholesky decomposition, LU decomposition, matrix inversion and condition, computing eigenvalues and eigenvectors, and their applications

(5) **Other topics (if time permits)**: graph data processing, massive data processing, Monte-Carlo methods, etc

**Formative coursework**

Students will be expected to produce 4 problem sets in the AT.

Bi-weekly exercises, involving computer programming and theory.

**Indicative reading**

Computational Statistics by Givens and Hoeting

Statistical computing in C++ and R by Eubank and Kupresanin

Foundations of Data Science by Blum, Hopcoft and Kannan

Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein

The Art of R Programming: A Tour of Statistical Software Design by Matloff

Think Python: How to Think Like a Computer Scientist by Downey

**Assessment**

Exam (70%, duration: 2 hours) in the spring exam period.

Coursework (30%).

**Student performance results**

(2020/21 - 2022/23 combined)

Classification | % of students |
---|---|

Distinction | 33.3 |

Merit | 33.3 |

Pass | 25 |

Fail | 8.3 |

** Key facts **

Department: Statistics

Total students 2023/24: 7

Average class size 2023/24: 8

Controlled access 2023/24: Yes

Value: Half Unit

**Course selection videos**

Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.

**Personal development skills**

- Self-management
- Team working
- Problem solving
- Application of information skills
- Communication
- Application of numeracy skills