Statistics seminar series

Statistics is all about getting data and analysing it and using it to answer questions about the world be that in terms of Economics, Finance or public opinions. The applications are numerous

The Department of Statistics hosts statistics seminars throughout the year. Seminars take place on Friday afternoons at 2pm in the Leverhulme Library (unless stated otherwise) with a lunch preceding at 1pm. All are very welcome to attend! Please contact Penelope Smith at for further information about any of these seminars below.

The Leverhulme Library (Room COL 6.15) is located on the sixth floor of Columbia House. Please view the LSE maps and directions

Current seminars in Michaelmas Term 2017 

Friday 15th September 2017 - Sewoong Oh

Sewoong Oh is an Assistant Professor at University of Illinois at Urbana-Champaign. Details of his talk are as follows: - 

Title: Achieving budget-optimality with adaptive schemes in crowdsourcing

Abstract: Crowdsourcing platforms provide marketplaces where task requesters can pay to get labels on their data. Such markets have emerged recently as popular venues for collecting annotations that are crucial in training machine learning models in various applications. However, as jobs are tedious and payments are low, errors are common in such crowdsourced labels. A common strategy to overcome such noise in the answers is to add redundancy by getting multiple answers for each task and aggregating them using some methods such as majority voting. For such a system, there is a fundamental question of interest: how can we maximize the accuracy given a fixed budget on how many responses we can collect on the crowdsourcing system. We characterize this fundamental trade-off between the budget (how many answers the requester can collect in total) and the accuracy in the estimated labels. In particular, we ask whether adaptive task assignment schemes lead to a more efficient trade-off between the accuracy and the budget.

Adaptive schemes, where tasks are assigned adaptively based on the data collected thus far, are widely used in practical crowdsourcing systems to efficiently use a given fixed budget. However, existing theoretical analyses of crowdsourcing systems suggest that the gain of adaptive task assignments is minimal. To bridge this gap, we investigate this question under a strictly more general probabilistic model, which has been recently introduced to model practical crowdsourced annotations. Under this generalized Dawid-Skene model, we characterize the fundamental trade-off between budget and accuracy. I will present a novel adaptive task assignment scheme that matches this fundamental limit. This allows us to quantify the fundamental gap between adaptive and non-adaptive schemes, by comparing the trade-off with the one for non-adaptive schemes.  

Friday 6th October 2017 - Jian Qing Shi

Jian Qing Shi is a Reader in Statistics at Newcastle University. Details of his talk are as follows:- 

Title: Functional Regression Analysis and Variable Selection for Big Medical Movement Data 

Abstract: In this talk, I will present a nonlinear mixed-effects scalar-on-function regression model using a Gaussian process prior. This model is motivated from the analysis of movement data which are collected in our current joint project on assessing upper limbs' function after stroke. The talk will focus on a novel variable selection algorithm, namely functional least angle regression (fLARS), and demonstrate how the algorithm can be used to do variable selection from large number of candidates including both scalar and function-valued variables. Numerical results including simulation study and application to the movement data will also be discussed. 

Friday 20th October 2017 - Arthur Gretton

Arthur Gretton - University of College London

Title and abstract TBC.



Friday 3rd November 2017 - Shahin Tavakoli

Shahin Tavakoli - University of Warwick

Title and abstract TBC.


Past seminars in 2016/17

Bayesian Aggregation for Extraordinarily Large Dataset

Guang Cheng

Purdue University

19th May 2017 -  2-3pm in the Leverhulme Library

Title: Bayesian Aggregation for Extraordinarily Large Dataset

Abstract: In this talk, a set of scalable Bayesian inference procedures is developed for a general class of nonparametric regression models. Specifically, nonparametric Bayesian inferences are separately performed on each subset randomly split from a massive dataset, and then the obtained local results are aggregated into global counterparts. This aggregation step is explicit without involving any additional computation cost. By a careful partition, we show that our aggregated inference results obtain an oracle rule in the sense that they are equivalent to those obtained directly from the entire data (which are computationally prohibitive). For example, an aggregated credible ball achieves desirable credibility level and also frequentist coverage while possessing the same radius as the oracle ball.

Time-frequency analysis of locally stationary Hawkes processes

Francois Roueff

TELECOM Paris Tech

5th May 2017 -  2-3pm in the Leverhulme Library

Title: Time-frequency analysis of locally stationary Hawkes processes

Abstract: Self-exciting point processes have recently attracted a lot of interest in applications in the life sciences (seismology, genomics, neuro-science,...), but also in the modeling of high-frequency financial data. We introduce locally stationary Hawkes processes in order to generalise classical Hawkes processes away from stationarity by allowing for a time-varying second-order structure. A convenient way to reveal this interesting feature on a data set is to perform a time-frequency analysis. We introduce such a tool adapted to non-stationary point processes via non-parametric kernel estimation. Moreover,  we provide a fully developed nonparametric estimation theory of both local mean density and local Bartlett spectra of a locally stationary Hawkes process. In particular we apply our kernel estimation to two data sets of transaction times exhibiting time-evolving characteristics in the data that had not been made visible by classical  approaches.

Asymptotic theory for quadratic forms of high-dimensional data

Wei Biao Wu  

University of Chicago

24th March 2017 - 2.30-3.30pm in the Leverhulme Library

Title: Asymptotic theory for quadratic forms of high-dimensional data

Abstract: I will present an asymptotic theory for quadratic forms of sample mean vectors of high-dimensional data. An invariance principle for the quadratic forms is derived under conditions that involve a delicate interplay between the dimension p, the sample size n and the moment condition. Under proper normalization, central and non-central limit theorems are obtained. To perform the related statistical inference, I will propose a plug-in calibration method and a re-sampling procedure to approximate the distributions of the quadratic forms. The results will be applied multiple tests and inference of covariance matrix structures.

Functional data analysis by matrix completion

Victor Panaretos

Ecole Polytechnique Federale de Lausanne

17th March 2017 -  2-3pm in the Leverhulme Library

Title: Functional data analysis by matrix completion

Abstract: Functional data analyses typically proceed by smoothing, followed by functional PCA. This paradigm implicitly assumes that any roughness is due to nuisance noise. Nevertheless, relevant functional features such as time-localised or short scale variations may indeed be rough. These will be confounded with the smooth components of variation by the smoothing/PCA steps, potentially distorting the parsimony and interpretability of the analysis.

We consider the problem of recovering both smooth and rough variations on the basis of discretely observed functional data. Assuming that a functional datum arises as the sum of two uncorrelated components, one smooth and one rough, we develop identifiability conditions for the estimation of the two corresponding covariance operators.

The key insight is that they should possess complementary forms of parsimony: one smooth and of finite rank (large scale), and the other banded and of arbitrary rank (small scale). Our conditions elucidate the precise interplay between rank, bandwidth, and grid resolution. We construct nonlinear estimators of the smooth and rough covariance operators and their spectra via matrix completion, without assuming knowledge of the true bandwidth or rank; we establish their consistency and rates of convergence, and use them to recover the smooth and rough components of each functional datum, effectively producing separate functional PCAs for smooth and rough variation (based on joint work with my PhD student, Marie-Hélène Descary). 

Mediation analysis with more than one mediator

Rhian Daniel

London School of Hygiene & Tropical Medicine

3rd March 2017 -  2-3pm in the Leverhulme Library

Title: Mediation analysis with more than one mediator

Abstract: In diverse fields of empirical research, including many in the biological sciences, attempts are made to decompose the effect of an exposure on an outcome into its effects via different pathways. For example, it is well-established that breast cancer survival rates in the UK differ by socio-economic status. But how much of this effect is due to differential adherence to screening programmes? How much is explained by treatment choices? And so on.

These enquiries, traditionally tackled using simple regression methods, have been given much recent attention in the causal inference literature, specifically in the fruitful area known as Casual Mediation Analysis. The focus has mainly been on so-called natural direct and indirect effects, with flexible estimation methods that allow their estimation in the presence of non-linearities and interactions, and careful consideration given to the need for controlling confounding.

Despite these many developments, the estimation of natural direct and indirect effects is still plagued by one major limitation, namely its reliance on an assumption known as the "cross-world" assumption, an assumption so strong that no experiment could even hypothetically be designed under which its validity would be guaranteed. Moreover, the assumption is known to be violated when confounders of the mediator-outcome association are affected by the exposure, and thus in particular in settings that involve repeatedly measured mediators, or multiple correlated mediators.

In this talk, I will discuss alternative mediation effects known as interventional direct and indirect effects, (VanderWeele et al, Epidemiology, 2014), and a novel extension to the multiple mediator setting. This is joint work with Stijn Vansteelandt, University of Gent. We argue that interventional direct and indirect effects are policy-relevant and show that they can be identified under much weaker conditions than natural direct and indirect effects. In particular, they can be used to capture the path-specific effects of an exposure on an outcome that are mediated by distinct mediators, even when, as often, the structural dependence between the multiple mediators is unknown.

The approach will be illustrated using data on breast cancer survival. Finally, I will discuss extensions of this approach to settings with high-dimensional mediators. 

Sub-quadratic recovery of correlated pairs

Graham Cormode

Universirty of Warwick

17th February 2017 - 2-3pm in the Leverhulme Library

Title: Sub-quadratic recovery of correlated pairs

Abstract: Identifying correlations within multiple streams of high-volume time series is a general but challenging problem.  A simple exact solution has cost that is linear in the dimensionality of the data, and quadratic in the number of streams.  In this work, we use dimensionality reduction techniques (sketches), along with ideas derived from coding theory and fast matrix multiplication to allow fast (subquadratic) recovery of those pairs that display high correlation.

Joint work with Jacques Dark.

Dirichlet process mixtures of order-sparse data in retail analytics

Ioanna Manolopoulou140x186

Ioanna Manolopoulou

University of College London

3rd February 2017 - 2-3pm in the Leverhulme Library

Title: Dirichlet process mixtures of order-sparse data in retail analytics

Abstract: The rise of “big data” has led to the frequent need to store and process data sets consisting of large numbers of high dimensional observations. Due to storage restrictions, these observations might be recorded in a lossy-but-sparse manner, with information collapsed onto a few entries which are considered important. This results in informative missingness in the observed data. Our motivating application comes from retail analytics, where the behaviour of product sales is summarised by the price elasticity of each product with respect to a small number of its top competitors. The resulting data comprise vectors of cross-elasticities where only the top few entries are observed. Interest lies in characterising the behaviour of a product’s competitors, and clustering products based on how their competition is spread across the market. We develop nonparametric Bayesian models to represent these partially observed cross-elasticity vectors, which take into account the inherent censoring of the observation process. Our methodology treats the observed cross-elasticity vectors as order statistics sequences of variable length, using a Dirichlet Process Mixture Model with a Exponentiated Weibull kernel. Our approach allows us added flexibility for the distribution of each vector, while readily providing parameters that directly characterise the decay of the leading entries. Inference follows Neal’s (2000) algorithm 8, adapted to the particular context of our model. We implement our methods on a retail analytics dataset of the cross-elasticity coefficients, and our analysis reveals a few distinct types of behaviour across the different products of interest.

Joint work with James Pitkin and Gordon Ross. 


Casual and marginal models

Robin Evans

University of Oxford

20th January 2017 - 2-3pm in the Leverhulme Library

Title: Casual and Marginal Models

Abstract: Many causal parameters of interest, such as those arising in models with observed confounders or sequential treatments, are marginal quantities: that is, they are formed by averaging over a real or hypothetical population.  Several authors, including Havercroft and Didelez (Stat. Med. 31:4190-4206, 2012) and Young and Tchetgen Tchetgen (Stat. Med. 33, 1001-1014, 2014), have noted the practical difficulties of dealing with such quantities, even for discrete data.  This is due to the apparent incompatibility of a marginal parameterisation involving the causal quantity of interest and conditional parametric models used for modelling confounding (either observed or unobserved).  In some cases, the so-called g-null paradox implies that it is logically impossible for the conditional models and the marginal null hypothesis to hold simultaneously.  This means that even simulating from the null model to test new methods is not always possible. In this talk we provide a simple explanation of the g-null paradox, and how to avoid it.  In the discrete case, we adapt existing marginal parameterisations to causal models, allowing us to work with a wide range of causal models including marginal structural models (MSMs), Cox MSMs, structural nested models, and History Adjusted MSMs.  This makes it easy to simulate from and fit models, and allows the introduction of possibly high-dimensional individual-level covariates and the consideration of complex structure including stationarity and symmetry assumptions.  In continuous settings we provide a theoretical overview and some examples of implementation using copula methods.

Joint work with Vanessa Didelez of the Leibniz Institute, Bremen. 

Post-selection inference for models characterized by quadratic constraints

Joshua Loftus

University of Cambridge and Alan Turing Institute

6th December 2016 - 4.10-5pm in the Leverhulme Library

Title: Post-selection inference for models characterized by quadratic constraints

Abstract: To address the fundamental statistical problem of conducting inference after model selection a recent approach formed in Fithian et al. (2014) and Lee et al. (2016) conditions on the selected model and uses the corresponding truncated probability laws for inference. Though simple to state, the application of this principle varies in difficulty depending on which model selection procedure is under consideration. This work identifies a general mathematical framework encompassing many model selection procedures. The simple algebra of quadratic constraints allows computation of one-dimensional truncated supports for conditional versions of standard test statistics like the chi-squared and F tests used in regression. Several important examples illustrate the utility of this framework, including forward selection with groups of variables and linear model selection with cross-validation. 

Residual empirical processes

Hira Koul2140x153

Hira Koul

Michigan State University

6th December 2016 - 3-3.50pm in the Leverhulme Library

Title: Residual empirical processes

Abstract: Residual empirical processes are known to play a central role in the development of statistical inference in numerous additive models. This talk will discuss some history and some recent advances in the asymptotic uniform linearity of parametric and nonparametric residual empirical processes. We shall also discuss their usefulness in developing asymptotically distribution free goodness-of-fit tests for fitting an error distribution functions in nonparametric ARCH(1) models.

Decorrelated feature space partitioning for distributed sparse regression

Chenlei Leng150x140

Chenlei Leng

University of Warwick

18th November 2016 - 2-3pm in the Leverhulme Library

Title: Decorrelated feature space partitioning for distributed sparse regression

Abstract: Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space). While the majority of the literature focuses on sample space partitioning, feature space partitioning is more effective when p≫n. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In this paper, we solve these problems through a new embarrassingly parallel framework named DECO for distributed variable selection and parameter estimation. In DECO, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.  

Bootstrap of degree distribution in large pparse networks

Yulia Gel

University of Texas at Dallas

11th November 2016 - 2-3pm in the Leverhulme Library

Title: Bootstrap of degree distribution in large sparse networks

Abstract: We propose a new method of nonparametric bootstrap to quantify estimation uncertainties in functions of network degree distribution in large ultra sparse networks. Both network degree distribution and network order are assumed to be unknown. The key idea is based on adaptation of the ``blocking'' argument, developed for bootstrapping of time series and re-tiling of spatial data, to random networks. We first sample a set of multiple ego networks of varying orders that form a patch, or a network block analogue, and then resample the data within patches. To select an optimal patch size, we develop a new computationally efficient and data-driven cross-validation algorithm. In our simulation study, we show that the new fast patchwork bootstrap (FPB) outperforms competing approaches by providing sharper and better calibrated confidence intervals for functions of a network degree distribution, including the cases of networks in an ultra sparse regime. In addition, the FPB is substantially less computationally expensive, requires less information on a graph, and is free from nuisance parameters. We illustrate the FPB in application to collaboration networks in statistics and computer science and to Wikipedia networks. 

Domain prediction of complex indicators: model-based methods, transformations and robust alternatives

Nikos Tzavidis140x210

Nikos Tzavidis

University of Southampton

4th November 2016 - 2-3pm in the Leverhulme Library

Title: Domain prediction of complex indicators: model-based methods, transformations and robust alternatives

Abstract: Small Area (Domain) prediction of complex indicators for example, deprivation and inequality indicators typically relies on micro-simulation/model-based methods that use regression models with domain-specific random effects. When the Gaussian assumptions for the model error terms are met, Empirical Best Prediction (EBP) for domains is possible and should be preferred. In this talk we will present current research on alternative methodologies when the model assumptions are misspecified. To start with, we will discuss the use of transformations- focusing mainly on power and scaled transformationsfor trying to ensure the validity of the EBP  ssumptions. Transformations can help improve estimation but even small departures from the model assumptions can adversely impact upon estimation of parameters closer to the tails of the distribution and on estimation of the Mean Squared Error. We will then outline alternative, possibly more robust model-based methodologies. These methods are based on the use of a random effects model for the quantiles of the empirical distribution  unction that exploits the link between maximum likelihood estimation and the use of the Asymmetric Laplace Distribution as a working assumption. The talk will also briefly outline work on the use of this latter method with discrete outcomes in particular, count outcomes. 


Faithful variable screening for high-dimensional convex regression

Min Xu

University of Pennsylvania

28th October 2016 - 2-3pm in the Leverhulme Library

Title: Faithful variable screening for high-dimensional convex regression

Abstract: We study the problem of variable selection in convex nonparametric regression. Under the assumption that the true regression function is convex and sparse, we develop a screening procedure to select a subset of variables that contains the relevant variables. Our approach is a two-stage quadratic programming method that estimates a sum of one-dimensional convex functions, followed by one-dimensional concave regression fits on the residuals. In contrast to previous methods for sparse additive models, the optimization is finite dimensional and requires no tuning parameters for smoothness. Under appropriate assumptions, we prove that the procedure is faithful in the population setting, yielding no false negatives. We give a finite sample statistical analysis, and introduce algorithms for efficiently carrying out the required quadratic programs. The approach leads to computational and statistical advantages over fitting a full model, and provides an effective, practical approach to variable screening in convex regression. Joint work with Minhua Chen and John Lafferty. 

Dynamic copulas and market risk forecasting

Flavio Ziegelmann

Federal University of Rio Grande do Sul

Title: Dynamic copulas and market risk forecasting

Abstract: In this talk we propose forecasting portfolio market risk measures, such as Value at Risk (VaR) and Expected Shortfall (ES), via dynamic copula modelling. For that we describe several dynamic copula models, from naive ones to complex factor copulas. The last are able to tackle the curse of dimensionality whereas simultaneously introducing a high level of complexity into the model. We start with bi-dimensional copulas, then go to vine copulas when increasing moderately the dimension and finally jump to factor copulas for high dimensional portfolios. In the factor copula case we allow for different levels of flexibility in the dynamics of the dependence parameters, which are  driven by a GAS (Generalized Autorregressive Scores) model. Along the talk, we show some numerical analyses for both simulated and real data sets. 

Generalized SURE for optimal shrinkage of singular values in low-rank matrix denoising

Jeremie Bigot

University of Bordeaux

21st October 2016 - 2-3pm in the Leverhulme Library

Title: Generalized SURE for optimal shrinkage of singular values in low-rank matrix denoising

Abstract: We consider the problem of estimating a low-rank signal matrix from noisy measurements under the assumption that the distribution of the data matrix belongs to an exponential family. In this setting, we derive generalized Stein's unbiased risk estimation (SURE) formulas that hold for any  spectral estimators which shrink or threshold the singular values of the data matrix. This leads to new data-driven shrinkage rules, whose optimality is discussed using tools from random matrix theory and through numerical experiments. Under the spiked population model and in the asymptotic setting where the dimensions of the data matrix are let going to infinity, some theoretical properties of our approach are compared to recent results on asymptotically optimal shrinking rules for Gaussian noise. It also leads to new procedures for singular values shrinkage in finite-dimensional matrix denoising for Gaussian, Poisson or Gamma-distributed measurements. 

Large additive models for large datasets: modelling 4 decades of daily pollution data over the UK

Simon Wood

Univeristy of Bristol

14th October 2016 - 2-3pm in the Leverhulme Library

Title: Large additive models for large datasets: modelling 4 decades of daily pollution data over the UK

Abstract: The UK `black smoke' monitoring network has produced daily particulate air pollution data from a network of up to 2000 monitoring stations over several decades, resulting in >10^7 measurements in total. Spatio temporal modelling of the data is desirable in order to produce daily exposure estimates for cohort studies, for example. Generalized additive models/Latent Gaussian process models offer one way to do this if we can deal with the data volume and model size. This talk will discuss the development of methods for estimating generalized additive models having of order 10^4 coefficients, from of order 10^8 observations. The strategy combines 4 elements: (i) the use of rank reduced smoothers, (ii) fine scale discretization of covariates, (iii) an efficient approach to marginal likelihood optimization, that avoids computation of numerically awkward log determinant terms and (iv) marginal likelihood optimization algorithms that make good use of numerical linear algebra methods with reasonable scalability on modern multi-core processors. 600 fold speed ups can be achieved relative to the previous state of the art methods. This enables us to estimate spatio-temporal models for UK black smoke data over the last 4 decades at a daily resolution, where previously an annual resolution was challenging.

Some issues in generalized linear modeling

Alan Agresti

University of Florida

12th October 2016 - 4-5.30pm in Thai Theatre, NAB

Title: Some issues in generalized linear modeling

Abstract: This talk discusses several topics pertaining to generalized linear modeling.  With focus on categorical data, the topics include (1) bias in using ordinary linear models with ordinal categorical response data, (2) interpreting effects with nonlinear link functions, (3) cautions in using Wald inference (tests and confidence intervals) when effects are large or near the boundary of the parameter space, and (4) the behavior and choice of residuals for GLMs.  I will present few new research results, but these topics got my attention while I was writing the book "Foundations of Linear and Generalized Linear Models," recently published by Wiley.

Diffusion models in neuroscience and finance

Satish Iyengar

Satish Iyengar

University of Pittsburgh

7th October 2016 - 2-3pm in the Leverhulme Library

Title: Diffusion models in neuroscience and finance

Abstract: Stochastic models of neural activity are a well developed application in biology. Diffusion models for integrate-and-fire (I-F) neurons hold a prominent place because of the many synaptic inputs to a neuron, and because these models arise out of noisy versions of differential equations for the neural membrane's electrical properties. I will describe a leaky I-F model which leads to a reflecting Ornstein-Uhlenbeck process. I will then address the problem of maximum likelihood estimation of the parameters of this model when only the firing times corresponding to the first passage times are available. Then describe a two-dimensional diffusion model arising from a simple network and its use in finance. The coefficient of tail dependence is a quantity that measures how extreme events in one component of a bivariate distribution depend on extreme events in the other component. It is well-known that the Gaussian copula has zero tail dependence, a shortcoming for its application in credit risk modeling and quantitative risk management in general. We show that this property is shared by the joint distributions of hitting times of bivariate (uniformly elliptic) diffusion processes.