London View

Joint Econometrics and Statistics Seminar Series

Statistics takes the numbers that you have and summarises them into fewer numbers which are easily digestible by the human brain

The Joint Econometrics and Statistics Seminar Series is organised jointly by the Department of Statistics and the STICERD Econometrics Programme, focusing on research in statistics, econometrics, and their interface. We invite distinguished scholars to present cutting edge works on methodology, theory, and case studies. These seminars will take place from 12pm to 1pm on Fridays and all students and staff are welcome to attend! 

Lent Term 2023

Friday 20 January, 12-1pm - Rob Cornish (University of Oxford)

robcornishWebsite

This event will take place in the Graham Wallas Room (OLD 5.25).

Title - Causal Falsification of Digital Twins

Abstract - We consider how to assess the accuracy of a digital twin using real-world data. We formulate this problem within the framework of causal inference, which leads to a precise definition of what it means for a twin to be "correct" that seems appropriate for many applications. Unfortunately, fundamental results from the causal inference literature mean observational data cannot be used to certify that a twin is correct in this sense unless potentially tenuous assumptions are made, such as that the data are free of unmeasured confounding. To avoid these assumptions, we propose an assessment strategy that instead aims to find situations in which the twin is not correct, and present a general-purpose statistical procedure for doing so. Our approach yields reliable and actionable information about the twin under only the assumption of an i.i.d. dataset of observational trajectories, and in particular remains sound regardless of whether or not the data are confounded. We demonstrate the effectiveness of our methodology through a large-scale, real-world case study involving sepsis modelling within the Pulse Physiology Engine, which we assess using the MIMIC-III dataset of ICU patients.

Biography - Rob Cornish is a Florence Nightingale Bicentennial Fellow in the Department of Statistics at the University of Oxford. He previously completed a postdoc with Chris Holmes and Arnaud Doucet working on conformal inference and causal machine learning. Before that he completed his DPhil as part of the AIMS CDT under the supervision of Arnaud Doucet and George Deligiannidis. His thesis covered several topics in (primarily) Bayesian machine learning, including Monte Carlo methods and deep generative modelling. Before they left Oxford, he also worked with Frank Wood and Hongseok Yang on topics in probabilistic programming.

Take a look at Rob's slides (PDF)

Friday 3 February, 3-4pm - Jiayi Wang (University of Texas)

jiayiwangWebsite

This event will take place on Zoom: register here.

Please note the time change for this week only.

Title - Super Reinforcement Learning in Confounded Environments

Abstract - We introduce super reinforcement learning in the batch setting, which takes the observed action as input for achieving a stronger oracle in policy learning. In the presence of unmeasured confounders, the recommendations from human agents recorded in the observed data allow us to recover certain unobserved information. Including this information in the policy search, the proposed super reinforcement learning will yield a super-policy that is guaranteed to outperform both the standard optimal policy and the behavior one (e.g., human agents' recommendations). Furthermore, to address the issue of unmeasured confounding in finding super-policies, a number of non-parametric identification results are established. Based on these identification results, we develop several super-policy learning algorithms and derive their corresponding finite-sample regret guarantees. Finally, we illustrate the superior performance of our proposal through extensive simulations and two real datasets related to improving the health policy.

Bio - Jiayi Wang is an Assistant Professor in the Department of Mathematical Sciences at the University of Texas at Dallas. She obtained her Ph.D. degree in the Department of Statistics at Texas A&M University (TAMU), advised by Dr. Raymond Wong. Prior to TAMU, she received a B.S. in Statistics from Zhejiang University in 2017.

Jiayi Wang is broadly interested in methodology and theory in nonparametric statistics and machine learning. Her recent research focuses on statistical problems with complex functional data or unknown missing structures.

Take a look at Jiayi's slides (PDF)

Friday 17 February, 12-1pm - Haben Michael (University of Massachusetts)

habenmichaelWebsite

This event will take place in 32 Lincoln's Inn Fields (32L.1.05).

Title - Two projects on the AUC

Abstract - I will describe two current projects dealing with the AUC. The AUC is a measure of how well a binary classifier discriminates. Long popular in the medical sciences, it has seen new life in data science applications. In the first project, we consider two generalizations of the AUC to accommodate clustered data. We describe situations in which the two cluster AUCs diverge and other situations in which they coincide. In the second, we describe a nonparametric method of estimating the AUC of an index β^Tx when β is estimated from the same data, with a focus on nonparametric estimation of the difference of the AUCs of two distinct indices.

Biography - Haben Michael is an Assistant Professor at the University of Massachusetts. His research is mainly methodological work in causal inference, developing methods to infer causal relationships without the benefit of fully randomized experiments.

Friday 17 March, 12-1pm - Jialiang Li (National University of Singapore)

Jialing LiWebsite

This event will take place on Zoom: register here.

Title - A network approach to compute hypervolume under ROC manifold for multi-class biomarkers

Abstract - Computation of hypervolume under ROC manifold (HUM) is necessary to evaluate biomarkers for their capability to discriminate among multiple disease types or diagnostic groups. However the original definition of HUM involves multiple integration and thus a medical investigation for multi-class ROC analysis could suffer from huge computational cost when the formula is implemented naively. We introduce a novel graph-based approach to compute HUM efficiently in this paper. The computational method avoids the time-consuming multiple summation when sample size or the number of categories is large. We conduct extensive simulation studies to demonstrate the improvement of our method over existing R packages. We apply our method to two real biomedical data sets to illustrate its application.

Bio - Professor Jialiang Li obtained his Doctor of Philosophy in Statistics and Masters in Population Health Sciences from The University of Wisconsin Madison. Prof Li has made contributions to statistical methodology in diagnostic medicine, nonparametric regression and personalized medicine. He has also collaborated with medical researchers on projects involving the statistical analysis of medical data sets. Prof Li is an elected-member of the International Statistical Institute (ISI) and a Fellow of American Statistical Association (ASA). He has served on the editorial board of Biometrics, Lifetime Data Analysis and other journals and has supervised more than 10 PhD students.

Tuesday 21 March, 12-1pm - Guido Imbens (Stanford)

guido-imbens

Website

This event will take place in: The Shaw Library.

Please note the time change for this week only.

Title - Multiple Randomization Designs

Abstract - In this study we introduce a new class of experimental designs. In a classical randomized controlled trial (RCT), or A/B test, a randomly selected subset of a population of units (e.g., individuals, plots of land, or experiences) is assigned to a treatment (treatment A), and the remainder of the population is assigned to the control treatment (treatment B). The difference in average outcome by treatment group is an estimate of the average effect of the treatment. However, motivating our study, the setting for modern experiments is often different, with the outcomes and treatment assignments indexed by multiple populations. For example, outcomes may be indexed by buyers and sellers, by content creators and subscribers, by drivers and riders, or by travelers and airlines and travel agents, with treatments potentially varying across these indices. Spillovers or interference can arise from interactions between units across populations. For example, sellers' behavior may depend on buyers' treatment assignment, or vice versa. This can invalidate the simple comparison of means as an estimator for the average effect of the treatment in classical RCTs. We propose new experiment designs for settings in which multiple populations interact. We show how these designs allow us to study questions about interference that cannot be answered by classical randomized experiments. Finally, we develop new statistical methods for analyzing these Multiple Randomization Designs.

Bio - Guido Imbens does research in econometrics and statistics. His research focuses on developing methods for drawing causal inferences in observational studies, using matching, instrumental variables, and regression discontinuity designs.

Guido Imbens is Professor of Economics at the Stanford Graduate School of Business. After graduating from Brown University Guido taught at Harvard University, UCLA, and UC Berkeley. He holds an honorary degree from the University of St Gallen. Professor Imbens joined the GSB in 2012 where he specializes in econometrics, and in particular methods for drawing causal inferences. Guido Imbens is a fellow of the Econometric Society and the American Academy of Arts and Sciences. 

Take a look at Guido's slides (PDF)

Friday 31 March, 2-3pm - Xiaohong Chen (Yale University)

Xiaohong ChenWebsite

This event will take place on Zoom: register here.

Please note the time change for this week only.

Title - Inference on Time Series Nonparametric Conditional Moment Restrictions Using General Sieves

Abstract - General nonlinear sieve learnings are classes of nonlinear sieves that can approximate nonlinear functions of high dimensional variables much more flexibly than various linear sieves (or series). This paper considers general nonlinear sieve quasi-likelihood ratio (GN-QLR) based inference on expectation functionals of time series data, where the functionals of interest are based on some nonparametric function that satisfy conditional moment restrictions and are learned using multilayer neural networks. While the asymptotic normality of the estimated functionals depends on some unknown Riesz representer of the functional space, we show that the optimally weighted GN-QLR statistic is asymptotically Chi-square distributed, regardless whether the expectation functional is regular (root-n estimable) or not. This holds when the data are weakly dependent beta-mixing condition. We apply our method to the off-policy evaluation in reinforcement learning, by formulating the Bellman equation into the conditional moment restriction framework, so that we can make inference about the state-specific value functional using the proposed GN-QLR method with time series data. In addition, estimating the averaged partial means and averaged partial derivatives of nonparametric instrumental variables and quantile IV models are also presented as leading examples. Finally, a Monte Carlo study shows the finite sample performance of the procedure.

Bio - Xiaohong Chen is the Malcolm K. Brachman Professor of Economics, Yale University. Previously Chen has taught at University of Chicago, London School of Economics and New York University. Chen got her PhD in Economics from University of California, San Diego.

Chen is an elected member of the American Academy of Arts and Sciences since 2019, a fellow of the Econometric Society since 2007, a founding fellow of the International Association for Applied Econometrics since 2018, a fellow of the Journal of Econometrics since 2012, and an international fellow of Cemmap since 2007. Chen is a winner of the 2017 China Economics Prize. Chen has been a keynote or an invited speaker in many international conferences. She was the 2018 Sargan Lecturer of the Econometric Society, the 2019 Hilda Geiringer Lecturer, and the 2017 Econometric Theory lecturer.

Chen’s research field is econometrics. She is known for her research in penalized sieve estimation and inference on semiparametric and nonparametric models, such as semiparametric models of nonlinear time series, empirical asset pricing, copula, missing data, measurement error, nonparametric instrumental variables, semi/nonparametric conditional moment restrictions, causal inference.

Chen has published peer-reviewed papers in top-ranked general-purpose journals in economics: Econometrica and Review of Economic Studies; as well as in top-ranked journals in statistics and engineering: Annals of Statistics, Journal of the American Statistical Association, IEEE Tran Information Theory, IEEE Trans Neural Networks.

Chen also published several invited review chapters, including a chapter on the method of sieves in 2007 Handbook of Econometrics volumne 6B. She also won Econometric Theory Multa Scripsit Award in 2012, The Journal of Nonparametric Statistics 2010 Best Paper Award, The Richard Stone Prize in Journal of Applied Econometrics for the years 2008 and 2009, The Arnold Zellner Award for the best theory paper published in Journal of Econometrics in 2006 and 2007. Her PhD thesis was about stochastic approximation/Robbins-Monro procedure in function space for near-epoch dependent processes.

Chen is an editor of Journal of Econometrics since Jan 2019.
Chen was an associate editor of Econometrica, Review of Economic Studies, Quantitative Economics, Journal of Econometrics, Econometric Theory, Journal of Nonparametric Statistics, Econometrics Journal, and others.

 

Summer Term 2023

Friday 26 May, 12-1pm - Eric Laber (Duke University)

ericlaber

This event will take place in 20 Kingsway (KSW.2.12).

Website

Title - Reinforcement Learning for Respondent-Driven Sampling

Abstract - Respondent-driven sampling (RDS) is a network-based sampling strategy used to study hidden populations for which no sampling frame is available.  In each epoch of an RDS study, the current wave of study participants are incentivized to recruit the next wave through their social connections.  The success and efficiency of RDS can depend critically on attributes of incentives and the underlying (latent) network structure.  We propose a reinforcement learning-based adaptive RDS design to optimize some measure of study utility, e.g., efficiency, treatment dissemination, reach, etc.   Our design is based on a branching process approximation to the RDS process, however,  our proposed post-study inferential procedures apply to general network models even when the network is not fully identified.  Simulation experiments show that the proposed design provides substantial gains in efficiency over static and two-step RDS procedures.

Bio - Current Appointments & Affiliations: Professor of Statistical Science, Statistical Science, Trinity College of Arts & Sciences 2021 | Professor of Biostatistics & Bioinformatics, Biostatistics & Bioinformatics, Basic Science Departments 2021 | Research Professor of Global Health, Duke Global Health Institute, University Institutes and Centers 2021

Friday 16 June, 2.30-3.30pm - Bingxin Zhao (University of Pennsylvania)

bingxinzhao

This event will take place on Zoom: register here.

Website

Title - Exploring cross-trait genetic architectures: statistical models, computational challenges, and the BIGA platform

Abstract - Numerous statistical models have been proposed to analyze cross-trait genetic architectures utilizing summary statistics from genome-wide association studies (GWAS). However, systematically analyzing high-dimensional GWAS summary statistics presents logistical and computational challenges. In this talk, we introduce the BIGA platform (http://bigagwas.org/), a website that offers unified data analysis pipelines and centralized data resources. We have developed a framework that implements statistical genetics tools on a cloud computing platform, integrated with extensive curated GWAS datasets. Furthermore, we discuss our recent theoretical analyses of the LD score regression (LDSC), a widely-used method for inferring heritability and genetic correlation using GWAS summary statistics. We provide theoretical guarantees for LDSC-based estimators by explicitly modeling the block-wise dependence pattern of high-dimensional GWAS data. These analyses are joint work with Fei Xue and Yujue Li.

Bio - Bingxin Zhao is an assistant professor in the Wharton Statistics and Data Science Department at the University of Pennsylvania. His research interests involve the development and application of statistical and machine learning methods to analyze big data in various fields, including biomedical data science, environmental science, and social science.

Tuesday 20 June, 12-1pm - Elad Romanov (Stanford)

eladromanov

Website

Title - ScreeNOT: Optimal Singular Value Thresholding and Principal Component Selection in Correlated Noise

Abstract - Principal Component Analysis (PCA) is a fundamental and ubiquitous tool in statistics and data analysis.

The bare-bones idea is this. Given a data set of n points y_1, ..., y_n, form their sample covariance S. Eigenvectors corresponding to large eigenvalues--namely directions along which the variation within the data set is large--are usually thought of as "important"  or "signal-bearing"; in contrast, weak directions are often interpreted as "noise", and discarded along the proceeding steps of the data analysis pipeline. Principal component (PC) selection is an important methodological question: how large should an eigenvalue be so as to be considered "informative"?
Our main deliverable is ScreeNOT: a novel, mathematically-grounded procedure for PC selection. It is intended as a fully algorithmic replacement for the heuristic and somewhat vaguely-defined procedures that practitioners often use--for example the popular "scree test".
Towards tackling PC selection systematically, we model the data matrix as a low-rank signal plus noise matrix Y = X + Z; accordingly, PC selection is cast as an estimation problem for the unknown low-rank signal matrix X, with the class of permissible estimators being singular value thresholding rules. We consider a formulation of the problem under the spiked model. This asymptotic setting captures some important qualitative features observed across numerous real-world data sets: most of the singular values of Y are arranged neatly in a "bulk", with very few large outlying singular values exceeding the bulk edge. We propose an adaptive algorithm that, given a data matrix, finds the optimal truncation threshold in a data-driven manner under essentially arbitrary noise conditions: we only require that Z has a compactly supported limiting spectral distribution--which may be a priori unknown. Under the spiked model, our algorithm is shown to have rather strong oracle optimality properties: not only does it attain the best error asymptotically, but it also achieves (w.h.p.) the best error--compared to all alternative thresholds--at finite n.
This is joint work with Matan Gavish (Hebrew University of Jerusalem) and David Donoho (Stanford).

Bio - Elad Romanov is currently a postdoctoral researcher at the Department of Statistics, Stanford, hosted by David Donoho. Before that, he was a PhD student at the School of Computer Science and Engineering, the Hebrew University of Jerusalem, where he was fortunate to be advised by Or Ordentlich and Matan Gavish.


 

Past seminars 

Please have a look at the STICERD website for details on the past seminars.