Data Science Seminar Series

The data science seminar series aims to promote research related to machine learning, computer science, statistics and their interface. We invite both internal and external speakers to present their latest cutting edge research. All staff and students are welcome to attend our seminars!

Lent Term 2023

Monday 30 January, 2-3pm - David Ginsbourger (University of Bern)



This event will take place in the Graham Wallas Room (OLD 5.25).

Title - On Gaussian Process multiple-fold cross-validation

Abstract - In this talk I will give an overview of some recent results pertaining to the fast calculation of Gaussian Process multiple-fold cross-validation residuals and their covariances, as well as to kernel hyperparameter estimation via related approaches. At first, the focus will be put on results from (arXiv:2101.03108, joint work with Cedric Schärer), where fast Gaussian process leave-one-out formulae are generalized to multiple-fold cross-validation. A special focus will be put on the impact of designing the folds on covariance hyperparameter fitting. In particular, I will present results of a joint work with Athénaïs Gautier and Cédric Travelletti on an inverse problem from geosciences where considered formulae and criteria are applied to linear forms in the underlying GP and the way of partitioning observations is found to substantially affect range estimation.

Biography - David Ginsbourger is Professor (Extraordinarius) of Statistical Data Science and co-Director of the Institute of Mathematical Statistics and Actuarial Science at the University of Bern, where he is currently serving as Director of Studies in Statistics and leading the "Uncertainty Quantification and Spatial Statistics" research group. From 2015 to 2020, he mainly worked as a permanent senior researcher at Idiap Research Institute. He defended his PhD in Applied Mathematics at the Ecole Nationale Supérieure des Mines de Saint-Etienne in 2009. He is currently on the editorial boards of the SIAM/ASA Journal on Uncertainty Quantification and of Technometrics, as well as area chair/metareviewer at ICML 2022, NeurIPS 2022, and AISTATS 2023.

Take a look at David's slides (PDF)

Monday 27 February, 2-3pm - Erwan Scornet (Ecole Polytechnique)

erwan scornet


This event will take place in the Graham Wallas Room (OLD 5.25).

Title - Is interpolation benign for random forests?

Abstract - Statistical wisdom suggests that very complex models, interpolating training data, will be poor at predicting unseen examples. Yet, this aphorism has been recently challenged by the identification of benign overfitting regimes, specially studied in the case of parametric models: generalization capabilities may be preserved despite model high complexity. While it is widely known that fully-grown decision trees interpolate and, in turn, have bad predictive performances, the same behavior is yet to be analyzed for random forests. In this talk, I will present how the trade-off between interpolation and consistency takes place for several types of random forest models. In particular, I will establish that interpolation regimes and consistency cannot be achieved for non-adaptive random forests. Since adaptivity seems to be the cornerstone to bring together interpolation and consistency, we study the Median RF which is shown to be consistent even in the interpolation setting. Regarding Breiman's forest, we theoretically control the size of the interpolation area, which converges fast enough to zero, so that exact interpolation and consistency can occur in conjunction.

Biography - Since September 2016, Erwan Scornet is assistant professor at the Center for Applied Mathematics (CMAP) in Ecole Polytechnique near Paris. His research interests focus on theoretical statistics and Machine Learning with a particular emphasis on nonparametric estimates. He did his PhD thesis on a particular algorithm of Machine Learning called random forests, under the supervision of Gérard Biau (LSTA - Paris 6) and Jean-Philipe Vert (Institut Curie).

Take a look at Erwan's slides (PDF)

Monday 20 March, 2-3pm - Patrick Loiseau (INRIA, Ecole Polytechnique, ENSAE)



This event will take place in the Graham Wallas Room (OLD 5.25).

Title - Statistical discrimination in selection and matching

Abstract - Discrimination in selection problems such as hiring or college admission is often explained by implicit bias of the decision-maker against a disadvantaged demographic group. In this talk, we argue that discrimination may occur from second-order statistical properties even in the absence of bias. We consider a model where the decision-maker receives a noisy estimate of each candidate's quality, whose variance depends on the demographic group of the candidate---we term this implicit (or differential) variance. We show that regardless of the information that the decision-maker has to make its selection (Bayesian or group-oblivious), differential variance leads to discrimination in the selection. We then study the effect of affirmative action policies on the selection quality and show that, in some cases, it may even increase the selection quality. Finally, we analyze a stable matching problem, where there are two decision-makers selecting from the same pool of candidates. We show that even in the absence of differential variance, a difference across groups in the correlation between the quality estimates of the two decision-makers leads to discrimination.

Biography - Patrick Loiseau is a researcher at Inria Saclay, and an adjunct Professor at Ecole Polytechnique and ENSAE (Palaiseau). He is the co-head of the FairPlay team, a joint team between Criteo, ENSEA, Ecole Polytechnique, and Inria. Since 2019, he is also the co-holder of a chair of the MIAI@Grenoble Alpes institute on “Explainable and Responsible AI”. Prior to joining Inria, he was an Assistant Professor of data science at EURECOM and he held long-term visiting positions at UC Berkeley and at the Max-Planck Institute for Software Systems (MPI-SWS) where he was the recipient of a Humboldt fellowship for experienced researchers (2016). He works on game theory and machine learning, with a focus on societal and ethical aspects (fairness and privacy) and on security and privacy.

Monday 27 March, 4-5pm - Yeganeh Alimohammadi (Stanford)

resizedYeganeh Alimohammadi


This event will take place on Zoom.

Please note the time change for this week only.

Title - The Power of a Few Local Samples for Predicting Epidemics

Abstract - People's interaction networks play a critical role in epidemics. However, accurately mapping these interactions can be expensive and sometimes impossible, making it difficult to predict the likelihood and outcome of an outbreak. Instead, contact tracing a few samples from the population is enough to estimate an outbreak's likelihood and size. I will present a model-free estimator based on the contact tracing results and give theoretical guarantees on the estimator's accuracy for a large class of networks.

Bio - Yeganeh is a Ph.D. student in operations research at Stanford University, where she is advised by Amin Saberi. Her research interests are algorithm design and operations research with an emphasis on applications. In particular, she studies the theoretical grounds of network models of practical importance, mainly focusing on studying epidemics on networks, designing efficient sampling algorithms from large networks, and network optimization.

Summer Term 2023

Monday 15 May, 2-3pm - Marta Blangiardo (Imperial College)



This event will take place in 32 Lincoln's Inn Fields (32L.1.05).

Title - TBC

Abstract - TBC

Biography - Marta Blangiardo is a professor of Biostatistics in the Department of Epidemiology and Biostatistics and leads the Biostatistics and Data Science theme of the MRC Centre for Environment and Health. Her main interests are related on the methodological aspects of environmental exposure estimation and on spatial and spatio-temporal models for disease mapping and for risk assessment.

Monday 22 May, 2-3pm - Tengyuan Liang (University of Chicago)



This event will take place in 32 Lincoln's Inn Fields (32L.1.05).

Title - TBC

Abstract - TBC

Biography - Tengyuan Liang is a Professor at the University of Chicago, Booth School of Business. He uses principles from Learning and Statistics to understand models and data.

Tengyuan's research is supported by the NSF CAREER grant and the William Ladany faculty fellowship. His current research aims to: bridge the empirical and theoretical gap in modern statistical learning; understand optimization and inference of infinite-dimensional models; explore the role of stochasticity in solving non-convex optimization.

Monday 12 June, 2-3pm - Konstantina Palla (Spotify)


This event will take place in the Leverhulme Library (COL 6.15).

Title - TBC

Abstract - TBC

Biography - Konstantina Palla is a senior research scientist at Spotify research working with a great team in London, UK. Before that, she was a senior researcher at Microsoft Research Cambridge part of ML for Healthcare. She was a postdoc at the University of Oxford working with Yee_Whye Teh and did herPhD with Zoubin Ghahramani in Cambridge university where she was studying the fairy tales of Machine Learning.

A pure Bayesian in education, Konstantina spent years building probabilistic models that uncover the latent structure in data. Her work focused on methodology and was applied on a plurality of data, from genomics to relational. Most recently, she moved away from methodology driven research and got interested in solving challenges in the real world and most recently in the domain of Health. She constructed models for the prediction of adverse patient outcomes in a hospital. She mostly worked on time-series using probabilistic but also deep learning approaches. She is a firm believer on co-developing the ML models with the users they intend to assist.


Past Seminars archive

MT 2022 (PDF)

MT 2021 - LT 2022 (PDF)

MT 2020 -  LT 2021 (PDF)