Data Science

Overview

Data science is a broad, rapidly developing field that combines statistics and mathematics, artificial intelligence, machine learning and programming, for the extraction and structuring of knowledge from data.

The accelerating volume of data being created across science, society and commerce has made data science one of the fastest growing fields across every industry.

Our research in the data science area focuses on the development of machine learning and computational statistical methods, their theoretical foundations, and applications. Machine learning and computational statistics play an important role in a wide range of applications involving data, featuring variety, large dimension, volume or velocity.

We study machine learning algorithms for solving a variety of learning tasks, including supervised, semi-supervised, unsupervised, and reinforcement learning tasks. A special focus is devoted to fairness of machine learning, optimisation for machine learning, kernel methods, information theory, federated learning, and scalable models and tools for linking massive and distributed multimodal data. Our work on computational statistical methods include Bayesian inference, functional data analysis, large-scale statistical inference, and non-parametric estimation.

Faculty

Non-parametric statistics, causal inference, high-dimensional statistics

Mona’s research is centered on advancing methodologies to analyse and quantify the dependency structure in data. Her work focuses on developing interpretable measures of dependence and conditional dependence between variables, which play a key role in various statistical tasks such as variable selection, dimensionality reduction, sensitivity analysis, causal inference, and hypothesis testing.

Big data linkage & analytics, artificial intelligence applied to healthcare and socioeconomic data, federated learning models, data science teaching and assessment

Marcos is an Associate Professor of Data Science (Education), teaching Databases and Distributed Computing for Big Data, and coordinating capstone projects in the MSc in Data Science programme. His research interests comprise big and multimodal data linkage and analysis, machine learning and artificial intelligence applied to healthcare and socioeconomic data, and generative AI tools in education.

He is affiliated with the LSE Data Science Institute, involved with teaching and research initiatives related to data science and generative AI in education across LSE. He is also an Associate Researcher at CIDACS (Centre for Data and Knowledge Integration for Health), in Salvador, Brazil, where he contributes to bespoke data linkage tools supporting the design of massive population-based cohorts. Over the last 13 years, he has been involved in international cooperation projects funded by NVIDIA, Google, Bill & Melinda Gates Foundation, The Rockefeller Foundation, MRC UK, HDR UK, Wellcome Trust, Royal Academy of Engineering, The British Academy, and The Royal Society.

Marcos holds a PhD in Computer Science (UFRGS, Brazil, 2010), followed by postdoctoral research and specialisation in Health Data Science (Institute of Health Informatics, University College London, 2016-2018). He also holds a Postgraduate Certificate in Higher Education (LSE, 2021) and is a Fellow of the UK Higher Education Academy (FHEA).

Change-point, nonparametric, shape constraint, computing

Yining's current research focuses on developing new methods for statistical problems such as change-point detection and nonparametric estimation. He is also interested in understanding the computational aspects of statistical methods. He completed his PhD (2014) in Statistics at the University of Cambridge.

Trustworthy AI, optimisation for deep learning, adversarial machine learning, formal guarantees

Alessandro’s research interests revolve around trustworthiness in deep learning, with a particular focus on robustness and explainability. This includes leveraging tools from optimisation to both provide and enforce formal guarantees on neural network behaviour, with the long-term goal of building machine learning models that are probably trustworthy.

Alessandro's work is applicable to safety-critical systems, and a wide range of applications within sensitive domains, including: medicine, transportation, and various industrial use-cases. More generally, his research is aimed at applications where formal guarantees on machine learning systems are essential.

Bayesian inference, Gaussian processes, latent stochastic processes, sequential learning, stochastic epidemic modelling, volatility estimation, bond risk premia

Kostas’ research focuses on developing and applying advanced computational methods, such as Markov Chain and Sequential Monte Carlo, for Bayesian Inference. His methodology has mostly targeted continuous time probability models based on stochastic differential equations driven by standard or fractional Brownian motion. The areas of application include Financial and Econometric Time Series as well biomedical problems such as stochastic epidemic models and analysis of growth curves.

Prior to joining the Statistics Department of LSE, he was a post-doctoral researcher at the University of Cambridge, in the Signal Processing Laboratory of the Engineering Department. He completed his PhD (2007) in the Statistics Department of the Athens University of Economics and Business while spending some time at University of Lancaster.

Probabilistic machine learning, Bayesian inference, Gaussian processes, variational inference, inverse problems

Ieva’s research centres around Bayesian inference with a focus on inverse problems and uncertainty quantification in physics-based problems. These areas provide frameworks for making informed decisions in the presence of uncertainty, which is inherent in complex systems like climate modelling and environmental prediction. Inverse problems, in particular, are fundamental in reconstructing unknown quantities from observed data, a common challenge in fields such as geophysics and engineering.

The primary methodologies utilised and developed in her work are Gaussian processes for modelling complex spatial and temporal dependencies, and variational inference for scalable posterior estimation in high-dimensional Bayesian models. Applying these techniques to partial differential equation (PDE) based inverse problems and experimental design is of particular interest.

In recent years, Ieva has worked on interdisciplinary projects in climate modelling, focusing on experimental design for ice sheet modelling and analysis on climate simulator data, carried out in partnerships with the British Antarctic Survey.

Prior to joining LSE, Ieva held a Senior Research Fellowship in Statistical Science at UCL and a Biometrika Postdoctoral Research Fellowship at the University of Cambridge.

High-dimensional inference, algorithmic fairness, data science

Joshua's research interests involve improving practices in data science and machine learning to reduce the impact of bias, particularly biases associated with social harms and scientific reproducibility. This includes developing methods and software for statistical inference after model selection, and using causality to analyse the fairness and interpretability of algorithms in machine learning and artificial intelligence. More broadly, he is interested in high-dimensional statistics and causal inference, and in teaching theory, applications, and best practices in data science using the R statistical programming language.

Reinforcement learning, causal inference, statistical inference

Chengchun's research is concentrated on the intersection between artificial intelligence (AI) and statistics, with a particular focus on large language models (LLMs) and reinforcement learning (RL). His past research, supported by an EPSRC new investigator award, primarily focused on RL, which has become one of the most popular frontiers in AI. His current research has shifted towards LLMs, which have been developed at an unprecedented speed, transforming the way we learn, work, and communicate. He aims to apply his expertise in statistics and RL to make LLMs more transparent, -accountable, and trustworthy.

Statistical machine learning, information theoretical estimators, kernel methods, scalable computation

Zoltan Szabo is a Professor of Data Science at the Department of Statistics, LSE. Zoltan's research interest is statistical machine learning with focus on kernel methods, information theory (ITE), scalable computation, and their applications. These applications include safety-critical learning, style transfer, shape-constrained prediction, hypothesis testing, distribution regression, dictionary learning, structured sparsity, independent subspace analysis and its extensions, Bayesian inference, finance, economics, analysis of climate data, criminal data analysis, collaborative filtering, emotion recognition, face tracking, remote sensing, natural language processing, and gene analysis.

Zoltan enjoys helping and interacting with the machine learning (ML) and statistics community in various forms. He serves/served as (i) a Senior Area Chair/an Area Chair of the most prestigious ML conferences including ICML, NeurIPS, COLT, AISTATS, UAI, IJCAI, ICLR, (ii) the moderator of statistical machine learning (stat.ML) on arXiv, (iii) the Programme Director of MSc Data Science, (iv) an editorial board member of JMLR, a senior associate editor of the journal ACM Transactions on Probabilistic Machine Learning, and an associate editor of the journal Mathematical Foundations of Computing, (v) a reviewer of European (ERC), Israeli (ISF) and Swiss (SNSF) grant applications, (vi) a mentor of newcomers (NeurIPS, ICML). For further details, see his website.

Algorithms, decision making, machine learning, optimisation, statistical inference

Milan Vojnović’s research focuses on machine learning and optimisation, involving the development of novel algorithms and the analysis of their theoretical guarantees to support the design of efficient intelligent systems. He has made significant contributions to scalable optimisation methods for machine learning, multi-armed bandits, multi-agent systems, algorithms under uncertainty, game theory, and network system control and optimisation. His work has been applied across a variety of domains, including online platforms, computer networks, and machine learning systems.

He has received several best paper awards at leading conferences, as well as the ACM SIGMETRICS Rising Star Researcher Award and the ERCIM Cor Baayen Award. He is also the author of Contest Theory, published by Cambridge University Press in 2016.

Milan has held a visiting scientist position at Meta. Prior to joining LSE, he spent 13 years as a researcher at Microsoft Research, working on a wide range of projects. He also held a two-year appointment as an affiliated lecturer at the Statistical Laboratory, University of Cambridge. He was awarded his PhD in 2003 by the École Polytechnique Fédérale de Lausanne (EPFL), Switzerland.

High-dimensional statistics, changepoint analysis, dimension reduction, statistical-computational trade-offs

Tengyao Wang is broadly interested in the area of high-dimensional statistics. His research focuses mainly on developing computationally efficient procedures for high-dimensional problems, while at the same time understanding the potential statistical limitations imposed by computational constraints. Some of his current research interests include: (i) Sparse signal detection in high-dimensional data; (ii) Change-point detection and estimation problems; (iii) Dimension reduction techniques; (iv) Robust statistical procedures in the presence of missing data or heavy-tailed noise; (v) Nonparametric statistical inference and (vi) Applications, including medical statistics, financial data analysis and statistical learning-assisted material discovery.

Prior to joining LSE as an associate professor, Tengyao was a lecturer at University College London and a research fellow at Cantab Capital Institute for the Mathematics of Information, University of Cambridge. Tengyao was awarded the Royal Statistical Society Research Prize in 2019 and the Guy Medal in Bronze in 2023.

Research students

Hitesh Gudwani
Research interests: Machine learning, decision making under uncertainty, multi-arm bandits, optimization and AI alignment

Sakina Hansen
Research interests: Fair machine learning, explainability, equitable data science, philosophy and ethics of machine learning

Ziqing Ho
Research interests: Non-parametric regression, high-dimensional statistics, and machine learning

Liyuan Hu
Research interests: Reinforcement learning and statistical inference

Tao Ma
Research interests:

Pingfan Su
Research interests: Reinforcement learning, causal inference, generative AI and their applications in finance

Hanqi Wang
Research interests: Machine learning, statistical inference and statistical learning theory

Trevor Wrobleski
Research interests: Operations research, high-dimensional variable selection, computational efficiency optimisation, model averaging, and spatio-temporal modelling

Erhan Xu
Research interests: Reinforcement learning and large language models, focusing on LLM alignment, post-training of reasoning models

Kai Ye
Research interests: Offline reinforcement learning, confounded partially observable Markov decision processes (POMDPs), and high-dimensional statistics

Data Science

Overview

Faculty

Mona Azadkia - Assistant Professor

Marcos Barreto - Associate Professor (Education), Department Lead on AI and KEI Strategic Lead

Yining Chen - Associate Professor

Alessandro De Palma - Associate Professor

Kostas Kalogeropoulos - Associate Professor and MSc Statistics Programme Director

Ieva Kazlauskaitė - Assistant Professor

Joshua Loftus - Assistant Professor

Chengchun Shi - Associate Professor

Zoltan Szabo - Professor of Data Science and MSc Data Science Programme Director

Milan Vojnović - Professor and Head of Department

Tengyao Wang - Professor and MSc Statistics (Financial Statistics) Programme Director

Research students

Data Science

Overview

Faculty

Mona Azadkia - Assistant Professor

Research interests

About

Marcos Barreto - Associate Professor (Education), Department Lead on AI and KEI Strategic Lead

Research interests

About

Yining Chen - Associate Professor

Research interests

About

Alessandro De Palma - Associate Professor

Research interests

About

Kostas Kalogeropoulos - Associate Professor and MSc Statistics Programme Director

Research interests

About

Ieva Kazlauskaitė - Assistant Professor

Research interests

About

Joshua Loftus - Assistant Professor

Research interests

About

Chengchun Shi - Associate Professor

Research interests

About

Zoltan Szabo - Professor of Data Science and MSc Data Science Programme Director

Research interests

About

Milan Vojnović - Professor and Head of Department

Research interests

About

Tengyao Wang - Professor and MSc Statistics (Financial Statistics) Programme Director

Research interests

About

Research students