Browser does not support script.

- Information for
- Prospective students
- Current students
- Alumni
- Staff
- Businesses

- Information about
- Departments
- Institutes
- Divisions
- Research centres
- Supporting LSE
- Term dates
- LSE Vacations

- Useful links
- Library
- Careers
- Accommodation
- Media relations
- LSE Jobs
- LSE Blogs

Statistics is all about getting data and analysing it and using it to answer questions about the world be that in terms of Economics, Finance or public opinions. The applications are numerous

The Department of Statistics hosts Statistics seminars throughout the year. Seminars take place on Friday afternoons at 2pm in the Leverhulme Library (unless stated otherwise) with refreshments preceding at 1.30pm. All are very welcome to attend! Please contact Kayleigh Brewer at k.brewer@lse.ac.uk for further information about any of these seminars below.

The Leverhulme Library (Room COL 6.15) is located on the sixth floor of Columbia House. Please view the LSE maps and directions.

Title: Bayesian Dynamic Tensor Regression

Abstract:

Multidimensional arrays (i.e. tensors) of data are becoming increasingly available and call for suitable econometric tools. We propose a new dynamic linear regression model for tensor-valued response variables and covariates that encompasses some well known multivariate models such as SUR, VAR, VECM, panel VAR and matrix regression models as special cases. For dealing with the over-parametrization and over-fitting issues due to the curse of dimensionality, we exploit a suitable parametrization based on the parallel factor (PARAFAC) decomposition which enables to achieve both parameter parsimony and to incorporate sparsity effects. Our contribution is twofold: first, we provide an extension of multivariate econometric models to account for both tensor-variate response and covariates; second, we show the effectiveness of proposed methodology in defining an autoregressive process for time-varying real economic networks. Inference is carried out in the Bayesian framework combined with Monte Carlo Markov Chain (MCMC). We show the efficiency of the MCMC procedure on simulated datasets, with different size of the response and independent variables, proving computational efficiency even with high-dimensions of the parameter space. Finally, we apply the model for studying the temporal evolution of real economic networks.

Title: On sparsity scales and covariance matrix transformations

Abstract:

In many statistical contexts, for example in linear regression

and discriminant analysis, a covariance or concentration matrix is a

nuisance parameter, distinct from interest parameters which should

always have a direct subject-matter interpretation. It seems sensible to

model explicitly only those aspects of direct concern and retain a level

of agnosticism over other aspects.This has important implications in high-dimensional estimation problems, in which an assumption of sparsity is critical.

I will introduce continua of sparsity scales for covariance matrices, leading to sparsity on the original, inverse and matrix logarithmic scales as special cases. After discussing some special features of the logarithmic scale, I will present a theory of estimation appropriate for any given or estimated scale when the matrix dimension is larger than the sample size.

A corollary of the work is that a constrained optimization-based approach is unnecessary for estimating a sparse concentration matrix. Some open theoretical problems over misspecified sparsity structure are highlighted, with insights from simulations.

Title: Skew and multi-tailed multivariate distributions - a need in finance

Abstract:

It is a well-known fact that financial data exhibit heavy tails and, often, skewness. As response to the fallacies of the multinormal distribution to model financial data, the class of elliptically symmetric distributions (including the multivariate t-distribution) has been widely accepted as it allows for heavier-than-normal tails.

However, in situations where negative returns are much more extreme than positive returns, the assumption of elliptical symmetry is too restrictive. Moreover, a further restriction of elliptical distributions lies in the fact that they are governed by a scalar radial function, which implies that the tails are governed by a one-dimensional tail-weight parameter like in the multivariate t distribution.

In this talk I will first present new efficient tests for elliptical symmetry against skew-ellipticity based on the Le Cam theory of asymptotic experiments. With these new tests, I shall analyze financial data consisting of daily returns data of several major worldwide indexes. In the second part of my talk, I will present various models of flexible multivariate distributions from the literature and compare them in the light of the needs of financial data. This comparison is based both on properties of the distributions and a simulation study.

This is joint work with Sladjana Babic, Marc Hallin and David Veredas.

http://users.ugent.be/~chley/#/home**Seminars in Michaelmas Term 2018**

Title: Geometric MCMC for Bayesian Inverse Problems

Abstract:

Bayesian Inverse Problems often involve sampling posterior distributions on infinite-dimensional function spaces. Traditional Markov chain Monte Carlo (MCMC) algorithms are characterized by deteriorating mixing times upon mesh-refinement, when the finite-dimensional approximations become more accurate. Such methods are typically forced to reduce step-sizes as the discretization gets finer, thus are expensive as a function of dimension. Recently, a new class of MCMC methods with mesh-independent convergence times has emerged. However, few of them take into account the geometry of the posterior informed by the data. At the same time, recently developed geometric MCMC algorithms have been found to be powerful in exploring complicated distributions that deviate significantly from elliptic Gaussian laws, but are in general computationally intractable for models defined in infinite dimensions. In this work, we combine geometric methods on a finite-dimensional subspace with mesh-independent infinite-dimensional approaches. Our objective is to speed up MCMC mixing times, without significantly increasing the computational cost per step (for instance, in comparison with the vanilla preconditioned Crank–Nicolson (pCN) method). This is achieved by using ideas from geometric MCMC to probe the complex structure of an intrinsic finite-dimensional subspace where most data information concentrates, while retaining robust mixing times as the dimension grows by using pCN-like methods in the complementary subspace. The resulting algorithms are demonstrated in the context of three challenging Inverse Problems arising in subsurface flow, heat conduction and incompressible flow control. The algorithms exhibit up to two orders of magnitude improvement in sampling efficiency when compared with the pCN method.

Title: Distributed estimation of principal eigenspaces

Abstract:

Modern data sets are often decentralized; they are generated and stored in multiple sources across which the communication is constrained by bandwidth or privacy. This talk focuses on distributed estimation of principal eigenspaces of covariance matrices with decentralized data. We introduce and analyze a distributed algorithm that aggregates multiple principal eigenspaces through averaging the corresponding projection matrices. When the data distribution has sign-symmetric innovation, the distributed PCA is proved to be “unbiased” such that its statistical error will converge to zero as the number of data splits grows to infinity. For general distributions, when the number of data splits is not large, this algorithm is shown to achieve the same statistical efficiency as the full-sample oracle. We applied our algorithm to implement distributed partition of traffic network of Manhattan; the distributed procedure delivered similar partition results as the centralized procedure provided that the number of data splits is not large.

Title: Optimal change point detection and localization in Sparse dynamic networks.

Abstract: We study the problem of change point detection and localization in dynamic networks. We assume that we observe a sequence of independent adjacency matrices of given size, each corresponding to one realization from an unknown inhomogeneous Bernoulli model. The underlying distribution of the adjacency matrices may change over a subset of the time points, called change points. Our task is to recover with high accuracy the unknown number and positions of the change points. Our generic model setting allows for all the model parameters to change with the total number of time points, including the network size, the minimal spacing between consecutive change points, the magnitude of the smallest change and the degree of sparsity of the networks. We first identify an impossible region in the space of the model parameters such that no change point estimator is provably consistent if the data are generated according to parameters falling in that region. We propose a computationally simple novel algorithm for network change point localization, called Network Binary Segmentation, which relies on weighted averages of the adjacency matrices. We show that Network Binary Segmentation is consistent over a range of the model parameters that nearly cover the complement of the impossibility region, thus demonstrating the existence of a phase transition for the problem at hand. Next, we devise a more sophisticated algorithm based on singular value thresholding, called Local Refinement, that delivers more accurate estimates of the change point locations. We show that, under appropriate conditions, Local Refinement guarantees a minimax optimal rate for network change point localization while remaining computationally feasible.

Title: Isotonic regression in general dimensions

Abstract: We study the least squares regression function estimator over the class of real-valued functions on $[0,1]^d$ that are increasing in each coordinate. For uniformly bounded signals and with a fixed, cubic lattice design, we establish that the estimator achieves the minimax rate of order $n^{-\min\{2/(d+2),1/d\}}$ in the empirical $L_2$ loss, up to poly-logarithmic factors. Further, we prove a sharp oracle inequality, which reveals in particular that when the true regression function is piecewise constant on $k$ hyperrectangles, the least squares estimator enjoys a faster, adaptive rate of convergence of $(k/n)^{\min(1,2/d)}$, again up to poly-logarithmic factors.

Previous results are confined to the case $d \leq 2$. Finally, we establish corresponding bounds (which are new even in the case $d=2$) in the more challenging random design setting. There are two surprising features of these results: first, they demonstrate that it is possible for a global empirical risk minimisation procedure to be rate optimal up to poly-logarithmic factors even when the corresponding entropy integral for the function class diverges rapidly; second, they indicate that the adaptation rate for shape-constrained estimators can be strictly worse than the parametric rate.

Title: Are you sure you can use your estimated p-value?

Abstract: Abstract: p-value is probably the most important figure in statistical hypothesis test. However, the true p-value is unknown most of the time and it is common practice to use the limiting distribution of a test statistics to estimate the p-value. How accurate if the estimated p-value? Are the true and estimated p-value really close? In this talk we reveal the secret of the relative error of the estimated p-value against the true p-value for some well-known statistics.

***Please note that this seminar is on a Monday and takes place from 4-5pm in the Graham Wallas Room which is a different day, time and room from our normal Statistics seminars.***

Bin Yu is a Professor in the Department of Statistics and of Electrical Engineering & Computer Sciences at University of California at Berkeley. Take a look below at the title and abstract of her talk and her webpage.

**Title:** Three principles of data science: predictability, computability, and stability

**Abstract:** In this talk, I'd like to discuss the importance and connections of three principles of data science in the title and introduce the PCS workflow for the data science life cycle. PCS will be demonstrated in the context of two collaborative projects in neuroscience and genomics, respectively. The first project in neuroscience uses transfer learning to integrate fitted convolutional neural networks (CNNs) on ImageNet with regression methods to provide predictive and stable characterizations of neurons from the challenging primary visual cortex V4.

Our DeepTune characterization provides a rich description of the diverse V4 selection patterns. The second project proposes iterative random forests (iRF) as a stabilized RF to seek predictable and interpretable high-order interactions among biomolecules. For an enhancer status prediction problem for Drosophila based on high-throughput data, iRF was able to find 20 stable gene-gene interactions, of which 80% had been physically verified in the literature in the past few decades. Last but not least, the data results from both projects provide experimentally testable hypotheses and hence PCS can also serve as a scientific recommendation system for follow-up experiments.

Qiyang Han is final year PhD student at the University of Washington in Seattle. He will join Rutgers as an Assistant Professor from Autumn 2018. Take a look below at the title and abstract of his talk and his webpage.

**Title:** LSE: Beyond Gaussian regression models

**Abstract: **We study the convergence rate of the least squares estimator (LSE) in a regression model with possibly heavy-tailed errors. Despite its importance in practical applications, theoretical understanding of this problem has been limited. We first show that from a worst-case perspective, the convergence rate of the LSE in a general non-parametric regression model is given by the maximum of the Gaussian regression rate and the noise rate induced by the errors. In the more difficult statistical model where the errors only have a second moment, we further show that the sizes of the 'localized envelopes' of the model give a sharp interpolation for the convergence rate of the LSE between the worst-case rate and the (optimal) parametric rate. These results indicate both certain positive and negative aspects of the LSE as an estimation procedure in a heavy-tailed regression setting. The key technical innovation is a new multiplier inequality that sharply controls the size of the multiplier empirical process associated with the LSE, which also finds applications in shape-restricted and sparse linear regression problems.

**Past seminars in Lent Term 2018**

Jinyuan Chang, Southwestern University of Finance and Economics

Jinyuan Chang is a Professor of Statistics and Econometrics at Southwestern University of Finance and Economics (Chengdu, China). See Prof. Chang's website.

**Title**: High-dimensional statistical inferences with over-identification: Confidence set estimation and specification test

**Abstract**: The influential Generalized Methods of Moments (GMM) (Hansen, 1982) are popular for model construction and statistical inferences without requiring specifying a full parametric probability distribution. Over-identification is a signature feature of GMM that can accommodate more moment conditions than model parameters. High-dimensional statistical problems plus over-identification are challenging yet remain less explored. In this paper we investigate high-dimensional statistical inferences in presence of over-identification. Our concerns are two-fold: first, statistical inferences for low-dimensional components of the high-dimensional model parameters, and second, a specification test for the validity of the over-identified moment conditions. The two problems are solved using high-dimensional empirical likelihood. For the first problem, we propose to estimate a confidence set based on a new construction of estimating equations mapped from the original ones to a low-dimensional space with a linear transformation matrix whose rows are nearly orthogonal to the column space of the gradient matrix with respect to the nuisance parameters. For the second problem, we consider the univariate marginal empirical likelihood ratios respectively corresponding to each component of the high-dimensional moment conditions. By evaluating the marginal empirical likelihood ratios at a sparse estimator of the model parameter, we show that a specification test with over-identifications can be developed for assessing the validity of the moment conditions. Our theoretical analysis establishes the validity of the proposed procedures for statistical inferences, and our numerical examples demonstrate good performance of our proposed methods with high-dimensional problems.

Regina Liu is a Professor at Rutgers University - the State University of New Jersey. Take a look below at the title and abstract of her talk and her webpage.

**Title:** Nonparametric Tolerance Tubes for Tracking Functional Data

**Abstract:** Tolerance intervals and tolerance regions are important tools for process monitoring or statistical quality control of univariate and multivariate data, respectively. We discuss their generalization to tolerance tubes in the infinite dimensional setting for functional data. In addition to the generalizations of the commonly accepted definitions of the tolerance level of beta-content or beta-expectation, we introduce the new definition of alpha-exempt beta-expectation tolerance tube. The latter loosens the definition of beta-expectation tolerance tube by allowing alpha (pre-set using domain knowledge) portion of each functional be exempt from the requirement.

Those proposed tolerance tubes are completely nonparametric and broadly applicable. We discuss their general properties, and show that the alpha exempt beta-expectation tolerance tube is particularly useful in the setting where occasional short term aberrations of the functional data are deemed acceptable (or unpreventable) and they do not cause substantive deviation of the norm. This desirable property is elaborated further and illustrated with both simulations and real applications in continuous monitoring of blood glucose level in diabetes patients as well as of aviation risk patterns of aircraft landings.

This is joint work with Dr. Yi Fan, Amazon.com, Inc.

Po-ling is an Assistant Professor at the University of Wisconsin. Take a look below at the title and abstract of her talk and her webpage.

**Title:** Influence maximization in stochastic and adversarial settings

**Abstract:** We consider the problem of influence maximization in fixed networks, for both stochastic and adversarial contagion models. Such models may be used to model infection spreads in epidemiology, as well as the diffusion of information in viral marketing. In the stochastic setting, nodes are infected in waves according to linear threshold or independent cascade models. We establish upper and lower bounds for the influence of a subset of nodes in the network, where the influence is defined as the expected number of infected nodes at the conclusion of the epidemic. We quantify the gap between our upper and lower bounds in the case of the linear threshold model and illustrate the gains of our upper bounds for independent cascade models in relation to existing results. In the adversarial setting, an adversary is allowed to specify the edges through which contagion may spread, and the player chooses sets of nodes to infect in successive rounds. Our main result is to establish upper and lower bounds on the regret for possibly stochastic strategies of the adversary and player. This is joint work with Justin Khim (UPenn) and Varun Jog (UW-Madison).

Matteo is a Professor at University of Bologna. Take a look below at the title and abstract of his talk and webpage.

**Title: **Covariance matrix and factor model estimation by composite minimization

**Abstract: **In this talk, we address the problem of covariance matrix and factor model estimation in large dimensions under the low rank plus sparse assumption. Existing approaches based on PCA (POET, Fan et al. 2013) fail to catchlow rank spaces characterized by non-spiked eigenvalues, as in that case the asymptotic consistency of PCA established in (Bai, 2003) defaults. For this reason, UNALCE (UNshrunk ALgebraic Covariance Estimator), an alternativeapproach based on the solution of a low rank plus sparse decomposition problem, has been developed in (Farn´e and Montanari, 2017). Given the finite sample, this method is shown to produce the covariance estimate with the leastpossible dispersed eigenvalues among all the matrices having the same rank of the low rank component and the same support of the sparse component. In addition, consistency and recovery are guaranteed until pα log(p) n, where p is the dimension and n is the sample size, provided that latent eigenvalues scale to p α, α ∈ [0, 1]. Consequently, if p and n are fixed, exploiting the eigenvaluedispersion lemma in (Ledoit and Wolf, 2004) we can prove that loadings and factor scores estimated via UNALCE provide the tightest possible error bound. Simulation results show that UNALCE is particularly effective respect toPOET for recovering the proportion of latent variance, as well as the proportion of residual covariance and the number of non-zeros. Unlike POET, UNALCE exactly recovers the latent rank and the residual sparsity pattern, showing alsobetter fitting properties. It is also showed that UNALCE factor estimates are particularly useful if latent eigenvalues are not spiked and the sparse component is very sparse. Two real data-sets, regarding UK market data and ECBsupervisory data respectively, provide us further insights about the usefulness of UNALCE in practical applications.

Davide is a Lecturer in Statistics at King's College London. Take a look below at the title and abstract of his talk and webpage.

**Title: **Estimation of growth processes in forensic entomology

**Abstract:** Functional data are examples of high-dimensional data where the observed variables are generated by an underlying smooth process. This allow us to develop methods that go beyond what would be possible with classical multivariate techniques. In this talk, I will demonstrate the potential of functional data analysis for biological growth processes in forensic entomology, where there is the need of estimating time-dependent growth curves from experiments where larvae have been exposed to a relatively small number of constant temperature profiles. I will discuss the properties of the proposed estimator and show the results for a couple of real crime scene scenarios.

**Past seminars in Michaelmas Term 2017**

Yundong Tu is an Assistant Professor at Guanghua School of Management, Peking University. Details of his talk are as follows:

Title - Spurious Regressions in Functional-coefficient Models with Nonstationarity

Abstract - Functional coefficient cointegrating models have become popular to model nonlinear nonstationarity in econometrics (Cai, Li, Park, 2009; Xiao 2009). However, there is rare study on testing the existence of functional coefficient cointegration. Consequently, functional coefficient regressions involving nonstationary regressors may be spurious. This paper investigates the effect that spurious functional coefficient regression has on the usual diagnostics. We find that common characteristics of spurious regression are manifest, including divergent local significance tests, random local goodness-of-fit, and local Durbin-Watson ratio converging to zero, complementing those discovered in spurious linear and nonparametric regressions (Phillips1986, Phillips2009). In addition, spuriousness causes the divergence of the global significance tests proposed by Xiao (2009) and Sun, Cai and Li (2016), which is likely to produce misleading conclusions for practitioners. To resolve the problems, we propose a simple-to-implement inference procedure based on a semiparametric balanced regression, by augmenting regressors of the original spurious regression with lagged dependent variable and independent variables. This procedure achieves spurious regression detection via standard inferential asymptotics. Monte Carlo simulations show that the proposed tests enjoy nice finite sample performance.

Karthik Bharath is an Assistant Professor at the School of Mathematical Sciences, University of Nottingham. Details of his talk are as follows:

Title - Sampling of warp maps for curve alignment

Abstract - Alignment of functional or curve data in the presence of important landmark information is integral to analysis of biomedical data. I will discuss a sampling scheme for warp maps used in alignment of open and closed curves, possibly with landmark constraints. The scheme provides a point process-based constructive definition of a probability measure on the set of warp maps of [0, 1] and the unit circle. The measure is used (i) as a proposal distribution in a stochastic algorithm to solve a variational formulation of curve alignment, and (ii) as a prior on warp maps in a Bayesian model for alignment.

Shahin Tavakoli is an Assistant Professor at the University of Warwick. Details of his talk are as follows:

Title: A Spatial Modeling Approach for Linguistic Object Data: Analysing dialect sound variations across Great Britain

Abstract: Dialect variation is of considerable interest in linguistics and other social sciences. However, traditionally it has been studied using proxies (transcriptions) rather than acoustic recordings directly. We introduce novel statistical techniques to analyse geolocalised speech recordings and to explore the spatial variation of pronunciations continuously over the region of interest, as opposed to traditional isoglosses, which provide a discrete partition of the region. Data of this type require an explicit modeling of the variation in the mean and the covariance. Usual Euclidean metrics are not appropriate, and we therefore introduce the concept of d-covariance, which allows consistent estimation both in space and at individual locations. We then propose spatial smoothing for these objects which accounts for the possibly non convex geometry of the domain of interest. We apply the proposed method to data from the spoken part of the British National Corpus, deposited at the British Library, London, and we produce maps of the dialect variation over Great Britain. In addition, the methods allow for acoustic reconstruction across the domain of interest, allowing researchers to listen to the statistical analysis. This is joint work with Davide Pigoli and John Aston (Cambridge), and John Coleman (Oxford).

Arthur Gretton is a Professor at University College London. Details of his talk are as follows:

Title: Learning interpretable features to compare distributions

Abstract: I will present adaptive two-sample tests with optimised testing power and interpretable features. The tests will be based on the maximum mean discrepancy (MMD), a difference in the expectations of features under the two distributions being tested. Useful features are defined as being those which contribute a large divergence between distributions with high confidence. These features can be defined explicitly (points in space); or implicitly via a reproducing kernel, in which case there may be infinitely many of them. As an example application, I will test for subtle differences in the distribution of real hand-written digits and the distribution of digits obtained from a generative model (for instance, small imbalances in the proportions of certain digits, or minor distortions that are implausible in normal handwriting). The statistical tests are are able to reliably find differences which humans are unable to perceive. Related, interpretable tests can be constructed for benchmarking and troubleshooting generative models, in a goodness-of-fit setting; testing for statistical dependence; and testing for multi-way interaction.

Jian Qing Shi is a Reader in Statistics at Newcastle University. Details of his talk are as follows:-

Title: Functional Regression Analysis and Variable Selection for Big Medical Movement Data

Abstract: In this talk, I will present a nonlinear mixed-effects scalar-on-function regression model using a Gaussian process prior. This model is motivated from the analysis of movement data which are collected in our current joint project on assessing upper limbs' function after stroke. The talk will focus on a novel variable selection algorithm, namely functional least angle regression (fLARS), and demonstrate how the algorithm can be used to do variable selection from large number of candidates including both scalar and function-valued variables. Numerical results including simulation study and application to the movement data will also be discussed.

Sewoong Oh is an Assistant Professor at University of Illinois at Urbana-Champaign. Details of his talk are as follows: -

Title: Achieving budget-optimality with adaptive schemes in crowdsourcing

Abstract: Crowdsourcing platforms provide marketplaces where task requesters can pay to get labels on their data. Such markets have emerged recently as popular venues for collecting annotations that are crucial in training machine learning models in various applications. However, as jobs are tedious and payments are low, errors are common in such crowdsourced labels. A common strategy to overcome such noise in the answers is to add redundancy by getting multiple answers for each task and aggregating them using some methods such as majority voting. For such a system, there is a fundamental question of interest: how can we maximize the accuracy given a fixed budget on how many responses we can collect on the crowdsourcing system. We characterize this fundamental trade-off between the budget (how many answers the requester can collect in total) and the accuracy in the estimated labels. In particular, we ask whether adaptive task assignment schemes lead to a more efficient trade-off between the accuracy and the budget.

Adaptive schemes, where tasks are assigned adaptively based on the data collected thus far, are widely used in practical crowdsourcing systems to efficiently use a given fixed budget. However, existing theoretical analyses of crowdsourcing systems suggest that the gain of adaptive task assignments is minimal. To bridge this gap, we investigate this question under a strictly more general probabilistic model, which has been recently introduced to model practical crowdsourced annotations. Under this generalized Dawid-Skene model, we characterize the fundamental trade-off between budget and accuracy. I will present a novel adaptive task assignment scheme that matches this fundamental limit. This allows us to quantify the fundamental gap between adaptive and non-adaptive schemes, by comparing the trade-off with the one for non-adaptive schemes.

**Past seminars in 2016/17**

19th May 2017 - 2-3pm in the Leverhulme Library

*Title:* Bayesian Aggregation for Extraordinarily Large Dataset

*Abstract:* In this talk, a set of scalable Bayesian inference procedures is developed for a general class of nonparametric regression models. Specifically, nonparametric Bayesian inferences are separately performed on each subset randomly split from a massive dataset, and then the obtained local results are aggregated into global counterparts. This aggregation step is explicit without involving any additional computation cost. By a careful partition, we show that our aggregated inference results obtain an oracle rule in the sense that they are equivalent to those obtained directly from the entire data (which are computationally prohibitive). For example, an aggregated credible ball achieves desirable credibility level and also frequentist coverage while possessing the same radius as the oracle ball.

5th May 2017 - 2-3pm in the Leverhulme Library

*Title: *Time-frequency analysis of locally stationary Hawkes processes

*Abstract:* Self-exciting point processes have recently attracted a lot of interest in applications in the life sciences (seismology, genomics, neuro-science,...), but also in the modeling of high-frequency financial data. We introduce locally stationary Hawkes processes in order to generalise classical Hawkes processes away from stationarity by allowing for a time-varying second-order structure. A convenient way to reveal this interesting feature on a data set is to perform a time-frequency analysis. We introduce such a tool adapted to non-stationary point processes via non-parametric kernel estimation. Moreover, we provide a fully developed nonparametric estimation theory of both local mean density and local Bartlett spectra of a locally stationary Hawkes process. In particular we apply our kernel estimation to two data sets of transaction times exhibiting time-evolving characteristics in the data that had not been made visible by classical approaches.

24th March 2017 - 2.30-3.30pm in the Leverhulme Library

*Title:* Asymptotic theory for quadratic forms of high-dimensional data

*Abstract:* I will present an asymptotic theory for quadratic forms of sample mean vectors of high-dimensional data. An invariance principle for the quadratic forms is derived under conditions that involve a delicate interplay between the dimension p, the sample size n and the moment condition. Under proper normalization, central and non-central limit theorems are obtained. To perform the related statistical inference, I will propose a plug-in calibration method and a re-sampling procedure to approximate the distributions of the quadratic forms. The results will be applied multiple tests and inference of covariance matrix structures.

Ecole Polytechnique Federale de Lausanne

17th March 2017 - 2-3pm in the Leverhulme Library

*Title:* Functional data analysis by matrix completion

*Abstract:* Functional data analyses typically proceed by smoothing, followed by functional PCA. This paradigm implicitly assumes that any roughness is due to nuisance noise. Nevertheless, relevant functional features such as time-localised or short scale variations may indeed be rough. These will be confounded with the smooth components of variation by the smoothing/PCA steps, potentially distorting the parsimony and interpretability of the analysis.

We consider the problem of recovering both smooth and rough variations on the basis of discretely observed functional data. Assuming that a functional datum arises as the sum of two uncorrelated components, one smooth and one rough, we develop identifiability conditions for the estimation of the two corresponding covariance operators.

The key insight is that they should possess complementary forms of parsimony: one smooth and of finite rank (large scale), and the other banded and of arbitrary rank (small scale). Our conditions elucidate the precise interplay between rank, bandwidth, and grid resolution. We construct nonlinear estimators of the smooth and rough covariance operators and their spectra via matrix completion, without assuming knowledge of the true bandwidth or rank; we establish their consistency and rates of convergence, and use them to recover the smooth and rough components of each functional datum, effectively producing separate functional PCAs for smooth and rough variation (based on joint work with my PhD student, Marie-Hélène Descary).

London School of Hygiene & Tropical Medicine

3rd March 2017 - 2-3pm in the Leverhulme Library

*Title:* Mediation analysis with more than one mediator

*Abstract:* In diverse fields of empirical research, including many in the biological sciences, attempts are made to decompose the effect of an exposure on an outcome into its effects via different pathways. For example, it is well-established that breast cancer survival rates in the UK differ by socio-economic status. But how much of this effect is due to differential adherence to screening programmes? How much is explained by treatment choices? And so on.

These enquiries, traditionally tackled using simple regression methods, have been given much recent attention in the causal inference literature, specifically in the fruitful area known as Casual Mediation Analysis. The focus has mainly been on so-called natural direct and indirect effects, with flexible estimation methods that allow their estimation in the presence of non-linearities and interactions, and careful consideration given to the need for controlling confounding.

Despite these many developments, the estimation of natural direct and indirect effects is still plagued by one major limitation, namely its reliance on an assumption known as the "cross-world" assumption, an assumption so strong that no experiment could even hypothetically be designed under which its validity would be guaranteed. Moreover, the assumption is known to be violated when confounders of the mediator-outcome association are affected by the exposure, and thus in particular in settings that involve repeatedly measured mediators, or multiple correlated mediators.

In this talk, I will discuss alternative mediation effects known as interventional direct and indirect effects, (VanderWeele et al, Epidemiology, 2014), and a novel extension to the multiple mediator setting. This is joint work with Stijn Vansteelandt, University of Gent. We argue that interventional direct and indirect effects are policy-relevant and show that they can be identified under much weaker conditions than natural direct and indirect effects. In particular, they can be used to capture the path-specific effects of an exposure on an outcome that are mediated by distinct mediators, even when, as often, the structural dependence between the multiple mediators is unknown.

The approach will be illustrated using data on breast cancer survival. Finally, I will discuss extensions of this approach to settings with high-dimensional mediators.

17th February 2017 - 2-3pm in the Leverhulme Library

*Title:* Sub-quadratic recovery of correlated pairs

*Abstract:* Identifying correlations within multiple streams of high-volume time series is a general but challenging problem. A simple exact solution has cost that is linear in the dimensionality of the data, and quadratic in the number of streams. In this work, we use dimensionality reduction techniques (sketches), along with ideas derived from coding theory and fast matrix multiplication to allow fast (subquadratic) recovery of those pairs that display high correlation.

Joint work with Jacques Dark.

3rd February 2017 - 2-3pm in the Leverhulme Library

*Title:* Dirichlet process mixtures of order-sparse data in retail analytics

*Abstract:* The rise of “big data” has led to the frequent need to store and process data sets consisting of large numbers of high dimensional observations. Due to storage restrictions, these observations might be recorded in a lossy-but-sparse manner, with information collapsed onto a few entries which are considered important. This results in informative missingness in the observed data. Our motivating application comes from retail analytics, where the behaviour of product sales is summarised by the price elasticity of each product with respect to a small number of its top competitors. The resulting data comprise vectors of cross-elasticities where only the top few entries are observed. Interest lies in characterising the behaviour of a product’s competitors, and clustering products based on how their competition is spread across the market. We develop nonparametric Bayesian models to represent these partially observed cross-elasticity vectors, which take into account the inherent censoring of the observation process. Our methodology treats the observed cross-elasticity vectors as order statistics sequences of variable length, using a Dirichlet Process Mixture Model with a Exponentiated Weibull kernel. Our approach allows us added flexibility for the distribution of each vector, while readily providing parameters that directly characterise the decay of the leading entries. Inference follows Neal’s (2000) algorithm 8, adapted to the particular context of our model. We implement our methods on a retail analytics dataset of the cross-elasticity coefficients, and our analysis reveals a few distinct types of behaviour across the different products of interest.

Joint work with James Pitkin and Gordon Ross.

20th January 2017 - 2-3pm in the Leverhulme Library

*Title:* Casual and Marginal Models

*Abstract:* Many causal parameters of interest, such as those arising in models with observed confounders or sequential treatments, are marginal quantities: that is, they are formed by averaging over a real or hypothetical population. Several authors, including Havercroft and Didelez (Stat. Med. 31:4190-4206, 2012) and Young and Tchetgen Tchetgen (Stat. Med. 33, 1001-1014, 2014), have noted the practical difficulties of dealing with such quantities, even for discrete data. This is due to the apparent incompatibility of a marginal parameterisation involving the causal quantity of interest and conditional parametric models used for modelling confounding (either observed or unobserved). In some cases, the so-called g-null paradox implies that it is logically impossible for the conditional models and the marginal null hypothesis to hold simultaneously. This means that even simulating from the null model to test new methods is not always possible. In this talk we provide a simple explanation of the g-null paradox, and how to avoid it. In the discrete case, we adapt existing marginal parameterisations to causal models, allowing us to work with a wide range of causal models including marginal structural models (MSMs), Cox MSMs, structural nested models, and History Adjusted MSMs. This makes it easy to simulate from and fit models, and allows the introduction of possibly high-dimensional individual-level covariates and the consideration of complex structure including stationarity and symmetry assumptions. In continuous settings we provide a theoretical overview and some examples of implementation using copula methods.

*Joint work with Vanessa Didelez of the Leibniz Institute, Bremen.*

University of Cambridge and Alan Turing Institute

6th December 2016 - 4.10-5pm in the Leverhulme Library

*Title:* Post-selection inference for models characterized by quadratic constraints

*Abstract:* To address the fundamental statistical problem of conducting inference after model selection a recent approach formed in Fithian et al. (2014) and Lee et al. (2016) conditions on the selected model and uses the corresponding truncated probability laws for inference. Though simple to state, the application of this principle varies in difficulty depending on which model selection procedure is under consideration. This work identifies a general mathematical framework encompassing many model selection procedures. The simple algebra of quadratic constraints allows computation of one-dimensional truncated supports for conditional versions of standard test statistics like the chi-squared and F tests used in regression. Several important examples illustrate the utility of this framework, including forward selection with groups of variables and linear model selection with cross-validation.

6th December 2016 - 3-3.50pm in the Leverhulme Library

*Title:* Residual empirical processes

*Abstract:* Residual empirical processes are known to play a central role in the development of statistical inference in numerous additive models. This talk will discuss some history and some recent advances in the asymptotic uniform linearity of parametric and nonparametric residual empirical processes. We shall also discuss their usefulness in developing asymptotically distribution free goodness-of-fit tests for fitting an error distribution functions in nonparametric ARCH(1) models.

18th November 2016 - 2-3pm in the Leverhulme Library

*Title:* Decorrelated feature space partitioning for distributed sparse regression

*Abstract: *Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space). While the majority of the literature focuses on sample space partitioning, feature space partitioning is more effective when p≫n. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In this paper, we solve these problems through a new embarrassingly parallel framework named DECO for distributed variable selection and parameter estimation. In DECO, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.

11th November 2016 - 2-3pm in the Leverhulme Library

*Title:* Bootstrap of degree distribution in large sparse networks

*Abstract:* We propose a new method of nonparametric bootstrap to quantify estimation uncertainties in functions of network degree distribution in large ultra sparse networks. Both network degree distribution and network order are assumed to be unknown. The key idea is based on adaptation of the ``blocking'' argument, developed for bootstrapping of time series and re-tiling of spatial data, to random networks. We first sample a set of multiple ego networks of varying orders that form a patch, or a network block analogue, and then resample the data within patches. To select an optimal patch size, we develop a new computationally efficient and data-driven cross-validation algorithm. In our simulation study, we show that the new fast patchwork bootstrap (FPB) outperforms competing approaches by providing sharper and better calibrated confidence intervals for functions of a network degree distribution, including the cases of networks in an ultra sparse regime. In addition, the FPB is substantially less computationally expensive, requires less information on a graph, and is free from nuisance parameters. We illustrate the FPB in application to collaboration networks in statistics and computer science and to Wikipedia networks.

4th November 2016 - 2-3pm in the Leverhulme Library

*Title: *Domain prediction of complex indicators: model-based methods, transformations and robust alternatives

*Abstract: *Small Area (Domain) prediction of complex indicators for example, deprivation and inequality indicators typically relies on micro-simulation/model-based methods that use regression models with domain-specific random effects. When the Gaussian assumptions for the model error terms are met, Empirical Best Prediction (EBP) for domains is possible and should be preferred. In this talk we will present current research on alternative methodologies when the model assumptions are misspecified. To start with, we will discuss the use of transformations- focusing mainly on power and scaled transformationsfor trying to ensure the validity of the EBP ssumptions. Transformations can help improve estimation but even small departures from the model assumptions can adversely impact upon estimation of parameters closer to the tails of the distribution and on estimation of the Mean Squared Error. We will then outline alternative, possibly more robust model-based methodologies. These methods are based on the use of a random effects model for the quantiles of the empirical distribution unction that exploits the link between maximum likelihood estimation and the use of the Asymmetric Laplace Distribution as a working assumption. The talk will also briefly outline work on the use of this latter method with discrete outcomes in particular, count outcomes.

28th October 2016 - 2-3pm in the Leverhulme Library

*Title:* Faithful variable screening for high-dimensional convex regression

*Abstract:* We study the problem of variable selection in convex nonparametric regression. Under the assumption that the true regression function is convex and sparse, we develop a screening procedure to select a subset of variables that contains the relevant variables. Our approach is a two-stage quadratic programming method that estimates a sum of one-dimensional convex functions, followed by one-dimensional concave regression fits on the residuals. In contrast to previous methods for sparse additive models, the optimization is finite dimensional and requires no tuning parameters for smoothness. Under appropriate assumptions, we prove that the procedure is faithful in the population setting, yielding no false negatives. We give a finite sample statistical analysis, and introduce algorithms for efficiently carrying out the required quadratic programs. The approach leads to computational and statistical advantages over fitting a full model, and provides an effective, practical approach to variable screening in convex regression. Joint work with Minhua Chen and John Lafferty.

Federal University of Rio Grande do Sul

*Title:* Dynamic copulas and market risk forecasting

*Abstract:* In this talk we propose forecasting portfolio market risk measures, such as Value at Risk (VaR) and Expected Shortfall (ES), via dynamic copula modelling. For that we describe several dynamic copula models, from naive ones to complex factor copulas. The last are able to tackle the curse of dimensionality whereas simultaneously introducing a high level of complexity into the model. We start with bi-dimensional copulas, then go to vine copulas when increasing moderately the dimension and finally jump to factor copulas for high dimensional portfolios. In the factor copula case we allow for different levels of flexibility in the dynamics of the dependence parameters, which are driven by a GAS (Generalized Autorregressive Scores) model. Along the talk, we show some numerical analyses for both simulated and real data sets.

21st October 2016 - 2-3pm in the Leverhulme Library

*Title:* Generalized SURE for optimal shrinkage of singular values in low-rank matrix denoising

*Abstract:* We consider the problem of estimating a low-rank signal matrix from noisy measurements under the assumption that the distribution of the data matrix belongs to an exponential family. In this setting, we derive generalized Stein's unbiased risk estimation (SURE) formulas that hold for any spectral estimators which shrink or threshold the singular values of the data matrix. This leads to new data-driven shrinkage rules, whose optimality is discussed using tools from random matrix theory and through numerical experiments. Under the spiked population model and in the asymptotic setting where the dimensions of the data matrix are let going to infinity, some theoretical properties of our approach are compared to recent results on asymptotically optimal shrinking rules for Gaussian noise. It also leads to new procedures for singular values shrinkage in finite-dimensional matrix denoising for Gaussian, Poisson or Gamma-distributed measurements.

14th October 2016 - 2-3pm in the Leverhulme Library

*Title:* Large additive models for large datasets: modelling 4 decades of daily pollution data over the UK

*Abstract:* The UK `black smoke' monitoring network has produced daily particulate air pollution data from a network of up to 2000 monitoring stations over several decades, resulting in >10^7 measurements in total. Spatio temporal modelling of the data is desirable in order to produce daily exposure estimates for cohort studies, for example. Generalized additive models/Latent Gaussian process models offer one way to do this if we can deal with the data volume and model size. This talk will discuss the development of methods for estimating generalized additive models having of order 10^4 coefficients, from of order 10^8 observations. The strategy combines 4 elements: (i) the use of rank reduced smoothers, (ii) fine scale discretization of covariates, (iii) an efficient approach to marginal likelihood optimization, that avoids computation of numerically awkward log determinant terms and (iv) marginal likelihood optimization algorithms that make good use of numerical linear algebra methods with reasonable scalability on modern multi-core processors. 600 fold speed ups can be achieved relative to the previous state of the art methods. This enables us to estimate spatio-temporal models for UK black smoke data over the last 4 decades at a daily resolution, where previously an annual resolution was challenging.

12th October 2016 - 4-5.30pm in Thai Theatre, NAB

*Title:* Some issues in generalized linear modeling

*Abstract:* This talk discusses several topics pertaining to generalized linear modeling. With focus on categorical data, the topics include (1) bias in using ordinary linear models with ordinal categorical response data, (2) interpreting effects with nonlinear link functions, (3) cautions in using Wald inference (tests and confidence intervals) when effects are large or near the boundary of the parameter space, and (4) the behavior and choice of residuals for GLMs. I will present few new research results, but these topics got my attention while I was writing the book "Foundations of Linear and Generalized Linear Models," recently published by Wiley.

7th October 2016 - 2-3pm in the Leverhulme Library

*Title:* Diffusion models in neuroscience and finance

*Abstract:* Stochastic models of neural activity are a well developed application in biology. Diffusion models for integrate-and-fire (I-F) neurons hold a prominent place because of the many synaptic inputs to a neuron, and because these models arise out of noisy versions of differential equations for the neural membrane's electrical properties. I will describe a leaky I-F model which leads to a reflecting Ornstein-Uhlenbeck process. I will then address the problem of maximum likelihood estimation of the parameters of this model when only the firing times corresponding to the first passage times are available. Then describe a two-dimensional diffusion model arising from a simple network and its use in finance. The coefficient of tail dependence is a quantity that measures how extreme events in one component of a bivariate distribution depend on extreme events in the other component. It is well-known that the Gaussian copula has zero tail dependence, a shortcoming for its application in credit risk modeling and quantitative risk management in general. We show that this property is shared by the joint distributions of hitting times of bivariate (uniformly elliptic) diffusion processes.

Browser does not support script.

Browser does not support script.

Browser does not support script.

Browser does not support script.

Browser does not support script.

Browser does not support script.