Data science is a broad, rapidly developing field that combines statistics and mathematics, artificial intelligence, machine learning and programming, for the extraction and structuring of knowledge from data.
The accelerating volume of data being created across science, society and commerce has made data science one of the fastest growing fields across every industry.
Our research in the data science area focuses on the development of machine learning and computational statistical methods, their theoretical foundations, and applications. Machine learning and computational statistics play an important role in a wide range of applications involving data, featuring variety, large dimension, volume or velocity.
We study machine learning algorithms for solving a variety of learning tasks, including supervised, semi-supervised, unsupervised, and reinforcement learning tasks. A special focus is devoted to fairness of machine learning, optimisation for machine learning, kernel methods, information theory, federated learning, and scalable models and tools for linking massive and distributed multimodal data. Our work on computational statistical methods include Bayesian inference, functional data analysis, large-scale statistical inference, and non-parametric estimation.
Mona’s research is centered on advancing methodologies to analyse and quantify the dependency structure in data. Her work focuses on developing interpretable measures of dependence and conditional dependence between variables, which play a key role in various statistical tasks such as variable selection, dimensionality reduction, sensitivity analysis, causal inference, and hypothesis testing.
Marcos Barreto - Associate Professor (Education), Department Lead on AI and KEI Strategic Lead
Big data linkage & analytics, artificial intelligence applied to healthcare and socioeconomic data, federated learning models, data science teaching and assessment
Marcos is an Associate Professor of Data Science (Education), teaching Databases and Distributed Computing for Big Data, and coordinating capstone projects in the MSc in Data Science programme. His research interests comprise big and multimodal data linkage and analysis, machine learning and artificial intelligence applied to healthcare and socioeconomic data, and generative AI tools in education.
He is affiliated with the LSE Data Science Institute, involved with teaching and research initiatives related to data science and generative AI in education across LSE. He is also an Associate Researcher at CIDACS (Centre for Data and Knowledge Integration for Health), in Salvador, Brazil, where he contributes to bespoke data linkage tools supporting the design of massive population-based cohorts. Over the last 13 years, he has been involved in international cooperation projects funded by NVIDIA, Google, Bill & Melinda Gates Foundation, The Rockefeller Foundation, MRC UK, HDR UK, Wellcome Trust, Royal Academy of Engineering, The British Academy, and The Royal Society.
Marcos holds a PhD in Computer Science (UFRGS, Brazil, 2010), followed by postdoctoral research and specialisation in Health Data Science (Institute of Health Informatics, University College London, 2016-2018). He also holds a Postgraduate Certificate in Higher Education (LSE, 2021) and is a Fellow of the UK Higher Education Academy (FHEA).
Yining's current research focuses on developing new methods for statistical problems such as change-point detection and nonparametric estimation. He is also interested in understanding the computational aspects of statistical methods. He completed his PhD (2014) in Statistics at the University of Cambridge.
Kostas Kalogeropoulos - Associate Professor and MSc Statistics Programme Director
Kostas’ research focuses on developing and applying advanced computational methods, such as Markov Chain and Sequential Monte Carlo, for Bayesian Inference. His methodology has mostly targeted continuous time probability models based on stochastic differential equations driven by standard or fractional Brownian motion. The areas of application include Financial and Econometric Time Series as well biomedical problems such as stochastic epidemic models and analysis of growth curves.
Prior to joining the Statistics Department of LSE, he was a post-doctoral researcher at the University of Cambridge, in the Signal Processing Laboratory of the Engineering Department. He completed his PhD (2007) in the Statistics Department of the Athens University of Economics and Business while spending some time at University of Lancaster.
Ieva’s research centres around Bayesian inference with a focus on inverse problems and uncertainty quantification in physics-based problems. These areas provide frameworks for making informed decisions in the presence of uncertainty, which is inherent in complex systems like climate modelling and environmental prediction. Inverse problems, in particular, are fundamental in reconstructing unknown quantities from observed data, a common challenge in fields such as geophysics and engineering.
The primary methodologies utilised and developed in her work are Gaussian processes for modelling complex spatial and temporal dependencies, and variational inference for scalable posterior estimation in high-dimensional Bayesian models. Applying these techniques to partial differential equation (PDE) based inverse problems and experimental design is of particular interest.
In recent years, Ieva has worked on interdisciplinary projects in climate modelling, focusing on experimental design for ice sheet modelling and analysis on climate simulator data, carried out in partnerships with the British Antarctic Survey.
Prior to joining LSE, Ieva held a Senior Research Fellowship in Statistical Science at UCL and a Biometrika Postdoctoral Research Fellowship at the University of Cambridge.
Joshua Loftus - Assistant Professor
High-dimensional inference, algorithmic fairness, data science
Joshua's research interests involve improving practices in data science and machine learning to reduce the impact of bias, particularly biases associated with social harms and scientific reproducibility. This includes developing methods and software for statistical inference after model selection, and using causality to analyse the fairness and interpretability of algorithms in machine learning and artificial intelligence. More broadly, he is interested in high-dimensional statistics and causal inference, and in teaching theory, applications, and best practices in data science using the R statistical programming language.
Chengchun's research is concentrated on the intersection between artificial intelligence (AI) and statistics, with a particular focus on large language models (LLMs) and reinforcement learning (RL). His past research, supported by an EPSRC new investigator award, primarily focused on RL, which has become one of the most popular frontiers in AI. His current research has shifted towards LLMs, which have been developed at an unprecedented speed, transforming the way we learn, work, and communicate. He aims to apply his expertise in statistics and RL to make LLMs more transparent, -accountable, and trustworthy.
Zoltan Szabo - Professor of Data Science and MSc Data Science Programme Director
Statistical machine learning, information theoretical estimators, kernel methods, scalable computation
Zoltan Szabo is a Professor of Data Science at the Department of Statistics, LSE. Zoltan's research interest is statistical machine learning with focus on kernel methods, information theory (ITE), scalable computation, and their applications. These applications include safety-critical learning, style transfer, shape-constrained prediction, hypothesis testing, distribution regression, dictionary learning, structured sparsity, independent subspace analysis and its extensions, Bayesian inference, finance, economics, analysis of climate data, criminal data analysis, collaborative filtering, emotion recognition, face tracking, remote sensing, natural language processing, and gene analysis.
Zoltan enjoys helping and interacting with the machine learning (ML) and statistics community in various forms. He serves/served as (i) a Senior Area Chair/an Area Chair of the most prestigious ML conferences including ICML, NeurIPS, COLT, AISTATS, UAI, IJCAI, ICLR, (ii) the moderator of statistical machine learning (stat.ML) on arXiv, (iii) the Programme Director of MSc Data Science, (iv) an editorial board member of JMLR, a senior associate editor of the journal ACM Transactions on Probabilistic Machine Learning, and an associate editor of the journal Mathematical Foundations of Computing, (v) a reviewer of European (ERC), Israeli (ISF) and Swiss (SNSF) grant applications, (vi) a mentor of newcomers (NeurIPS, ICML). For further details, see his website.
Milan Vojnović’s research focuses on machine learning and optimisation, involving the development of novel algorithms and the analysis of their theoretical guarantees to support the design of efficient intelligent systems. He has made significant contributions to scalable optimisation methods for machine learning, multi-armed bandits, multi-agent systems, algorithms under uncertainty, game theory, and network system control and optimisation. His work has been applied across a variety of domains, including online platforms, computer networks, and machine learning systems.
He has received several best paper awards at leading conferences, as well as the ACM SIGMETRICS Rising Star Researcher Award and the ERCIM Cor Baayen Award. He is also the author of Contest Theory, published by Cambridge University Press in 2016.
Milan has held a visiting scientist position at Meta. Prior to joining LSE, he spent 13 years as a researcher at Microsoft Research, working on a wide range of projects. He also held a two-year appointment as an affiliated lecturer at the Statistical Laboratory, University of Cambridge. He was awarded his PhD in 2003 by the École Polytechnique Fédérale de Lausanne (EPFL), Switzerland.
Tengyao Wang - Professor and MSc Statistics (Financial Statistics) Programme Director
Tengyao Wang is broadly interested in the area of high-dimensional statistics. His research focuses mainly on developing computationally efficient procedures for high-dimensional problems, while at the same time understanding the potential statistical limitations imposed by computational constraints. Some of his current research interests include: (i) Sparse signal detection in high-dimensional data; (ii) Change-point detection and estimation problems; (iii) Dimension reduction techniques; (iv) Robust statistical procedures in the presence of missing data or heavy-tailed noise; (v) Nonparametric statistical inference and (vi) Applications, including medical statistics, financial data analysis and statistical learning-assisted material discovery.
Prior to joining LSE as an associate professor, Tengyao was a lecturer at University College London and a research fellow at Cantab Capital Institute for the Mathematics of Information, University of Cambridge. Tengyao was awarded the Royal Statistical Society Research Prize in 2019 and the Guy Medal in Bronze in 2023.
Research students
Sakina Hansen Research interests: Fair machine learning, explainability, equitable data science, philosophy and ethics of machine learning
Ziqing Ho Research interests: Non-parametric regression, high-dimensional statistics, and machine learning
Liyuan Hu Research interests: Reinforcement learning and statistical inference
Pingfan Su Research interests: Reinforcement learning, causal inference, generative AI and their applications in finance
Trevor Wrobleski Research interests: Operations research, high-dimensional variable selection, computational efficiency optimisation, model averaging, and spatio-temporal modelling
Kai Ye Research interests: Offline reinforcement learning, confounded partially observable Markov decision processes (POMDPs), and high-dimensional statistics