Statistics Seminar Series

Statistics is all about getting data and analysing it and using it to answer questions about the world be that in terms of Economics, Finance or public opinions. The applications are numerous

The Department of Statistics hosts this Statistics Seminar Series throughout the year and usually taking place on Friday afternoons at 3pm. Topics include statistical methodology, theory, and applications. All are welcome to attend as we are currently holding these seminars remotely. 

Michaelmas Term 2020

Friday 9 October 2020, 3-4pm - Anru Zhang 



Title: Statistical Learning for High-dimensional Tensor Data

Abstract: The analysis of tensor data has become an active research topic in this area of big data. Datasets in the form of tensors, or high-order matrices, arise from a wide range of applications, such as financial econometrics, genomics, and material science. In addition, tensor methods provide unique perspectives and solutions to many high-dimensional problems, such as topic modeling and high-order interaction pursuit, where the observations are not necessarily tensors. High-dimensional tensor problems generally possess distinct characteristics that pose unprecedented challenges to the data science community. There is a clear need to develop new methods, efficient algorithms, and fundamental theory to analyze the high-dimensional tensor data.

In this talk, we discuss some recent advances in high-dimensional tensor data analysis through the consideration of several fundamental and interrelated problems, including tensor SVD and tensor regression. We illustrate how we develop new statistically optimal methods and computationally efficient algorithms that exploit useful information from high-dimensional tensor data based on the modern theories of computation, high-dimensional statistics, and non-convex optimization.  

Register here

Friday 23 October 2020, 3-4pm - Matthew Reimherr

Michael Reimherr


Title: KNG - A New Mechanism for Data Privacy

Abstract: Recently it was shown that the exponential mechanism is not asymptotically efficient, introducing too much noise, and thus reducing statistical utility quite broadly. Conversely, objective perturbation enjoys efvdxcellent utility, but can be difficult to generalize and requires strong structural assumptions. We show how our new approach, KNG, assuages nearly all of these issues; it is nearly as easy to implement as the exponential mechanism, but has much better asymptotic properties. We highlight how KNG agrees with well known mechanisms in simpler settings, while using its framework to develop new privacy tools in more complicated.

Register here

Friday 6 November 2020, 3-4pm - Edgar Dobriban



Title: On the statistical foundations of adversarially robust learning.

Abstract: Robustness has long been viewed as an important desired property of statistical methods. More recently, it has been recognized that complex prediction models such as deep neural nets can be highly vulnerable to adversarially chosen perturbations of their outputs at test time. This area, termed adversarial robustness, has garnered an extraordinary amount of attention in the machine learning community over the last few years. However, little is known about the most basic statistical questions. In this talk, I will present answers to some of them. This is joint work with Hamed Hassani, David Hong, and Alex Robey.

Register here

Friday 20 November 2020, 3-4pm - Stanislav Volgushev



Title: Structure learning for Extremes. 

Abstract: Extremal graphical models are sparse statistical models for multivariate extreme events. The underlying graph encodes conditional independencies and enables a visual interpretation of the complex extremal dependence structure. For the important case of tree models, we develop a data-driven methodology for learning the graphical structure. We show that sample versions of the extremal correlation and a new summary statistic, which we call the extremal variogram, can be used as weights for a minimum spanning tree to consistently recover the true underlying tree. Remarkably, this implies that extremal tree models can be learned in a completely non-parametric fashion by using simple summary statistics and without the need to assume discrete distributions, existence of densities, or parametric models for marginal or bivariate distributions. Extensions to more general graphs are also discussed.  

Register here 

Friday 4 December 2020, 3-4pm - Tracy Ke

Zheng Ke

Biography: Tracy Ke is currently Assistant Professor of Statistics at Harvard University. She obtained her PhD in Operations Research and Financial Engineering from Princeton University in 2014, advised by Professor Jianqing Fan. From 2014 to 2018, she was Assistant Professor of Statistics at Chicago University. She joined Harvard University in 2018. Her research interests include high-dimensional statistics, machine learning, network data analysis, and text mining. In her work on high-dimensional statistics, she is particularly interested in the optimal statistical inference when the signals are very rare and weak. In her work on network data analysis, she is particularly interested in estimating the latent community structure of a network. She is the recipient of NSF CAREER Award and ASA Noether Young Scholar Award. Also see here -

Title: Estimating the number of communities in a social network. 

Abstract: Given a symmetric network with n nodes, how to estimate the number of communities K is a fundamental problem in social network. We propose Stepwise Goodness-of-Fit (StGoF) as a new approach to estimating K. For m = 1, 2, . . ., StGoF alternately uses a community detection step (pretending m is the correct number of communities) and a goodness-of-fit step. We use a spectral method, SCORE, for community detection, and propose a new goodness-of-fit measure. Denote the goodness-of-fit statistic in step m by ψ(m). We show that as n → ∞, ψ(m) converges to a standard normal distribution when m = K and ψ(m) goes to infinity in probability when m < K. Therefore, with a proper threshold, StGoF terminates at m = K as desired. 

We consider a broad setting where we allow severe degree heterogeneity, a wide range of sparsity, and especially weak signals. In particular, we propose a measure for signal-to-noise ratio (SNR) and show that there is a phase transition: when SNR → 0 as n → ∞, consistent estimates for K do not exist, and when SNR → ∞, StGoF is consistent, uniformly for a broad class of settings. In this sense, StGoF achieves the optimal phase transition. 

(Joint work with Jiashun Jin, Shengming Luo, and Minzhe Wang)

Register here


Lent Term 2021

Friday 29 January 2021, 3-4pm - Stephen Bates

 Stephen Bates

Biography: Stephen is a postdoctoral researcher with Michael I. Jordan in the Statistics and EECS departments at UC Berkeley. He works on developing methods to analyze modern scientific data sets, leveraging sophisticated black box models while providing rigorous statistical guarantees. More specifically, he works on high-dimensional statistics (especially false discovery rate control), statistical machine learning, conformal prediction and causal inference.

Previously, Stephen completed his Ph.D. in the Stanford Department of Statistics advised by Emmanuel Candès. His thesis introduced methods for conditional independence testing and false discovery rate control in genomics, and was honored to receive the Ric Weiland Graduate Fellowship and the Theodore W. Anderson Theory of Statistics Dissertation Award for this work. Before his Ph.D.,Stephen studied statistics and mathematics at Harvard University, and then lived abroad teaching mathematics at NYU Shanghai. 

Title: Distribution-Free, Risk-Controlling Prediction Sets.

Abstract: To enable valid statistical inference in prediction tasks, we show how to generate set-valued predictions for black-box predictors that control the expected loss on future test points at a user-specified level. Our approach provides explicit finite-sample guarantees for any distribution by using a holdout set to calibrate the size of the prediction sets. We demonstrate our procedure in five large-scale machine learning problems: (1) classification problems where some mistakes are more costly than others; (2) multi-label classification, where each observation has multiple associated labels; (3) classification problems where the labels have a hierarchical structure; (4) image segmentation, where we wish to predict a set of pixels containing an object of interest; and (5) protein structure prediction. 

Register here

Friday 12 February 2021, 3-4pm - Elizabeth Stuart

Elizabeth Stuart

Biography: Trained as a statistician, Elizabeth's primary research interests are in the development and use of methodology to better design and analyze the causal effects of public health and educational interventions. In this way, she hopes to bridge statistical advances and research practice, working with mental health and educational researchers to identify and solve methodological challenges.

Title: The need for, and challenges of, policy evaluation during the COVID-19 pandemic. 

Abstract: To limit the spread of the novel coronavirus, governments across the world implemented extraordinary physical distancing policies, such as stay-at-home orders, and numerous studies aim to estimate their effects. Many statistical and econometric methods, such as difference-in-differences, leverage repeated measurements and variation in timing to estimate policy effects, including in the COVID-19 context.

However, disentangling policy effects from other factors can be challenging in general, nevermind in the context of a pandemic with spillover, infectious disease dynamics, and a wide variety of policies being enacted. This talk will provide an overview of policy evaluation methods, including event studies and policy trial emulation approaches, discuss some of the limitations of standard two-way-fixed-effects type models, and discuss some of the specific issues in using these methods during the pandemic. The work will be motivated using a stylized analysis of the impact of state-level stay-at-home orders on total coronavirus cases. A conclusion is that estimates from panel methods -- with the right data and careful modeling and diagnostics -- can help add to our understanding of many policies, though doing so is often challenging. 

Register here

Friday 26 February 2021, 3-4pm - Elizabeth Ogburn

Elizabeth Ogburn

Biography: Elizabeth is an Associate Professor in the Department of Biostatistics at Johns Hopkins University and founder of the COVID-19 Collaboration Platform.

Her research is in causal inference and epidemiologic methods. Broadly, she is interested in developing methods for and describing the behavior of traditional statistical machinery when standard assumptions are not met. Elizabeth has worked on characterizing the bias that results from misclassification, i.e. violations of the assumption that variables were measured accurately. She has also worked on semiparametric estimation of instrumental variables models, as these models are useful for certain violations of “no unmeasured confounding” assumptions. 

Title: Social network dependence, the replication crisis, and (in)valid inference. 

Abstract: In the first part of this talk, we show that social network dependence can result in spurious associations due to network dependence, potentially contributing to replication crises across the health and social sciences.  Researchers in these fields frequently sample subjects from one or a small number of communities, schools, hospitals, etc., and while many of the limitations of such convenience samples are well-known, the issue of statistical dependence due to social network ties has not previously been addressed. A paradigmatic example of this is the Framingham Heart Study (FHS). Using a statistic that we adapted to measure network dependence, we test for network dependence and for possible spurious associations in several of the thousands of influential papers published using FHS data. Results suggest that some of the many decades of research on coronary heart disease, other health outcomes, and peer influence using FHS data may suffer from spurious estimates of association and anticonservative uncertainty quantification due to unacknowledged network structure.  

But data with network dependence abounds, and in many settings researchers are explicitly interested in learning about social network dynamics.  Therefore, there is high demand for methods for causal and statistical inference with social network data. The second part of the talk describes recent work on causal inference for observational data from a single social network, focusing on (1) new types of causal estimands that are of interest in social network settings, and (2) conditions under which central limit theorems hold and inference based on approximate normality is licensed.

Register here

Friday 12 March 2021, 3-4pm - Eric Tchetgen Tchetgen


Biography: TBC

Title: TBC

Abstract: TBC

Friday 26 March 2021, 3-4pm - Hyunseung Kang


Biography: TBC

Title: TBC

Abstract: TBC