Home > Department of Statistics > Events > 2015-16 Seminar Series > Statistics Seminar Series 2015-16

Department of Statistics

Columbia House

London School of Economics

Houghton Street

London

WC2A 2AE

 

Columbia House is located at 69 Aldwych

 

LSE Campus Maps and Directions

 

Enquiries about our seminars should be address to:

 

Penelope Smith

p.a.smith@lse.ac.uk

Statistics Events

statistics.events@lse.ac.uk

 

Twitter

Facebook

Statistics Seminar Series 2015-16

The Department of Statistics hosts statistics seminars throughout the year. Seminars take place on Friday afternoons at 2pm, unless otherwise stated, in the Leverhulme Library. All are very welcome to attend. Please contact Events for further information about any of these seminars.

The Leverhulme Library (Room COL 6.15) is located on the sixth floor of Columbia Hourse, 69 Aldych. Please view the LSE maps and directions here.


9 October 2015

pWolfePatrick Wolfe

UCL (Department of Statistical Science)

Network analysis and nonparametric statistics

Abstract: Networks are ubiquitous in today's world. Any time we make observations about people, places, or things and the interactions between them, we have a network. Yet a quantitative understanding of real-world networks is in its infancy, and must be based on strong theoretical and methodological foundations. The goal of this talk is to provide some insight into these foundations from the perspective of nonparametric statistics, in particular how trade-offs between model complexity and parsimony can be balanced to yield practical algorithms with provable properties.


 23 October 2015

nKantasNikolas Kantas

Imperial College London (Department of Mathematics)

Sequential Monte Carlo Methods for High-Dimensional Inverse Problems

Abstract: We consider the inverse problem of estimating the initial condition of a partial differential equation, which is only observed through noisy measurements at discrete time intervals. In particular, we focus on the case where Eulerian measurements are obtained from the time and space evolving vector field, whose evolution obeys the two-dimensional Navier-Stokes equations defined on a torus. We will adopt a Bayesian formulation resulting from a particular regularisation that ensures the problem is well posed. In the context of Monte Carlo based inference, it is a challenging task to obtain samples from the resulting high dimensional posterior on the initial condition. We will propose a generic adaptive Sequential Monte Carlo (SMC) sampling approach for high dimensional inverse problems that overcomes some of these challenges. The method can be used for a wider class of inverse problems and builds upon appropriate Markov chain Monte Carlo (MCMC) techniques, which are currently considered as benchmarks for evaluating data assimilation algorithms used in practice. In our numerical examples, the proposed SMC approach achieves the same accuracy as MCMC but in a much more efficient manner. If time permits we will discuss some extensions of these ideas for high dimensional non-linear filtering problems.

The talk is based on joint works with Alexandros Beskos (UCL), Ajay Jasra (NUS), Alexandre Thiery (NUS) and Dan Crisan (Imperial).


6 November 2015

manchester-uniJohan Koskinen

University of Manchester (School of Social Sciences)

Estimation and prediction for multilevel exponential random graph models with partially observed data

Abstract: Social network data typically consists of a set of nodes - representing people, organisations, or other social units – and a set of ties – representing some meaning full relation on the social units. Statistical methods have been used to analyse social network data at least since the early Nineteen Thirties. Exponential random graph models (ERGM) are a class of exponential family, log-linear models for network data that explicitly takes into account the dependencies between tie-variables. The basic premise is that we want to allow for a tie between two nodes to potentially depend on other ties in some well-defined neighbourhood. A number of specific dependence assumptions have been proposed in the literature that implies models that have proven to fit data well. The complex dependencies do however mean that the ERGMs do not marginalise; something that has adverse inference for partially observed or sampled network data. We consider here inference strategies for estimating ERGMs for such imperfectly observed data. In particular, we consider sampling on so-called multilevel networks, where there are different types of nodes, for example people and the organisations they belong to.


20 November 2015

jBowdenJack Bowden

University of Bristol (School of Social and Community Medicine)

On the physical interpretation of a meta-analysis in the presence of heterogeneity and bias: from clinical trials to Mendelian randomisation

Abstract: The funnel plot is a graphical visualisation of summary data estimates from a meta-analysis, and is a useful tool for detecting departures from the standard modelling assumptions. Although perhaps not widely appreciated, a simple extension of the funnel plot can help to facilitate an intuitive interpretation of the mathematics underlying a meta-analysis at a more fundamental level, by equating it to determining the centre of mass of a physical system. We used this analogy, with some success, to explain the concepts of weighing evidence and of biased evidence to a young audience at the Cambridge Science Festival, without recourse to precise definitions or statistical formulae. In this talk I aim to formalise this analogy at a more technical level using the estimating equation framework: firstly, to help elucidate some of the basic statistical models employed in a meta-analysis and secondly, to forge new connections between bias adjustment in the evidence synthesis and causal inference literatures.

This talk is based on joint work with Chris Jackson at the MRC Biostatistics Unit. For further reading see here.


4 December 2015

pSmithPeter W F Smith

University of Southampton (Social Statistics and Demography)

The Administrative Data Research Network, the Administrative Data Research Centre for England and Survey Representativeness

Abstract: I will first introduce the Administrative Data Research Network (ADRN) and the Administrative Data Research Centre for England (ADRC-E). The ADRN is a UK-wide partnership between universities, government departments and agencies, national statistics authorities, the third sector and researchers, funded by the Economic and Social Research Council. As part of the network, the ADRC-E provides research support staff and state-of-the-art secure facilities with access to high-performance computer systems to help accredited researchers carry out social and economic research using linked, de-identified administrative data – information which is routinely collected by government organisations, such as tax, education and health data. I will then describe one of the ADRC-E’s first projects which is assessing the use of representativeness indicators to monitor risks of non-response bias during survey data collection. This project makes use of a unique dataset linking call record paradata from three UK social surveys to census auxiliary attribute information on sample households.

Professor Peter Smith is the Director of the Administrative Data Research Centre for England and Professor of Social Statistics at the University of Southampton.


11 December 2015
Room COL 6.15 (Leverhulme Library),Department of Statistics, 6th Floor, Columbia House (69 Aldwych) LSE maps and directions

Special additional 2015 seminar (two speakers)

Timetable:
12:30 - 13:30: Buffet lunch
13:30 - 14:20: Professor Ming-Yen Cheng's talk
14:20 - 14:40: Coffee break
14:40 - 15:30: Dr Yichao Wu's talk

Cheng-vertMing-Yen Cheng

Professor, Department of Mathematics, National Taiwan University

Greedy forward regression for variable screening

Abstract: Two popular variable screening methods under the ultra-high dimensional setting with the desirable sure screening property are the sure independence screening (SIS) and the forward regression (FR). Both are classical variable screening methods and recently have attracted greater attention under the new light of high-dimensional data analysis. We consider a new and simple screening method that incorporates multiple predictors in each step of forward regression, with decision on which variables to incorporate based on the same criterion. If only one step is carried out, it actually reduces to the SIS. Thus it can be regarded as a generalization and unification of the FR and the SIS. More importantly, it preserves the sure screening property and has similar computational complexity as FR in each step, yet it can discover the relevant covariates in fewer steps. Thus, it reduces the computational burden of FR drastically while retaining advantages of the latter over SIS. Furthermore, we show that it can find all the true variables if the number of steps taken is the same as the correct model size, even when using the original FR. An extensive simulation study and application to two real data examples demonstrate excellent performance of the proposed method.

yichao_wuYichao Wu

Associate Professor, Department of Statistics, North Carolina State University

Variable selection via measurement error model selection likelihoods

Abstract: The measurement error model selection likelihood was proposed in Stefanski, Wu and White (2014) to conduct variable selection. It provides a new perspective on variable selection. The first part of my talk will be a review of the measurement error model selection likelihoods. In the second part, I will present an extension to nonparametric variable selection in kernel regression. If time permits, I will talk briefly about a related project on structure recovery for additive models.


 22 January 2016

Chandler1Richard Chandler

Professor of Statistics, Department of Statistical Science, UCL

The interpretation of climate model ensembles

Abstract: Almost all projections of future climate, and its impacts, rely at some level on the outputs of numerical models (simulators) of the climate system. These simulators represent the main physical and chemical (and, in some cases, biological) processes in the atmosphere and oceans. However, different simulators give different projections of future climate - indeed, the choice of simulator can be the dominant source of uncertainty in some applications. It is therefore becoming common practice to consider the outputs from several different simulators when making and using climate projections. The question then arises: how should the information from different simulators be combined? There are many challenging statistical issues here. Two key ones are (a) that simulators cannot be considered as independent (for example, many of them share common pieces of computer code); and (b) that no single simulator is uniformly better than another so that simple techniques, such as assigning weights to simulators, are not defensible. This talk will review the issues involved and present a Bayesian framework for resolving them. The ideas will be illustrated by considering projections of future global temperature.


5 February 2016

BiancaBianca DeStavola

London School of Hygiene and Tropical Medicine

Current issues in mediation analysis

 Abstract: In diverse fields of empirical research attempts are made to decompose the effect of an exposure on an outcome into its effects via a number of different pathways. Path analysis has a long tradition in dealing with enquiries of this sort, but more recent contributions in the causal inference literature have led to greater understanding of the statistical estimands for these pathway-specific effects, the assumptions under which they can be identified, and statistical methods for doing so.  However the majority of causal inference contributions has focused on settings with no intermediate confounders and considers only partitioning the total effect of an exposure into the components that involve or do not involve a single mediator. These restrictions are very limiting in mediation studies applied to life course epidemiology, where intermediate confounding is the norm, or to studies involving multiple biomarkers as mediators, now increasingly common in the OMICS era. This talk will discuss extensions to these settings using examples taken from a life course study of eating disorders in girls. This work is in collaboration with Rhian Daniel (LSHTM), George Ploubidis (UCL)  and Nadia Micali (UCL).
For further reading, please see here   and here

 

 


19 February 2016

dr_mark_fiecas_room_b0.18_thumbnail1Mark Fiecas

University of Warwick

Modelling the Evolution of Brain Signals

Abstract: Our goal is to use local field potentials (LFPs) to rigorously study changes in neuronal activity in the hippocampus and the nucleus accumbens over the course of an associative learning experiment. We show that the spectral properties of the LFPs changed during the experiment. While many statistical models take into account nonstationarity within a single trial of the experiment, the evolution of brain dynamics across trials is often ignored. 
In this talk, we will discuss a novel time series model that captures both sources of nonstationarity. Under the proposed model we rigorously define the spectral density matrix so that it evolves over time within a trial and also the across trials of an experiment. To estimate the evolving evolutionary spectral density matrix, we used a two-stage procedure. In the first stage, we computed the within-trial time-localized periodogram matrix. In the second stage, we developed a data-driven approach for combining information across trials from the local periodogram matrices. We assessed the performance of our proposed method using simulated data. Finally, we used the proposed model to study how the spectral properties of the hippocampus and the nucleus accumbens evolved over the course of an associative learning experiment. This is joint work with Hernando Ombao (Department of Statistics, UC Irvine). 

 


 4 March 2016

wZhangWenyang Zhang

University of York

Estimation of High Dimensional Dynamic Covariance Matrix

Abstract: The estimation of high dimensional covariance matrix is an important subject in statistics and econometrics.  Most of the existing methods assume thecovariance matrix is a constant matrix.  This assumption limits the application of covariance matrix estimation.  In many cases, the covariance matrix concerned is dynamic.  In this talk, I am going to present a new type of dynamic covariance matrices.  An estimation procedure of the proposed dynamic covariance matrices will be described in this talk.  Intensive simulation studies are also conducted to show how well the proposed estimation methods work.  Finally, I will show an example in which the proposed dynamic covariance matrices with the associated estimation procedure are used to allocate portfolio in an investment in stock market.  The return of the portfolio constructed based our method seems very encouraging.

 


18 March 2016

Berthet2Quentin Berthet

University of Cambridge

Trade-offs in Statistical Learning

Abstract: I will explore the notion of constraints on learning procedures, and discuss the impact that they can have on statistical precision. This is inspired by real-life concerns such as limits on time for computation, on reliability of observations, or communication between agents. I will show how these constraints can be shown to have a concrete cost on the statistical performance of these procedures, by describing several examples.

  


29 April 2016

Bharath Sriperumbudur

Bharath Sriperumbudar

Pennsylvania State University

Density estimation in infinite dimensional exponential families

Abstract: We consider an infinite dimensional generalization of natural exponential family of probability densities, which are parametrized by functions in a reproducing kernel Hilbert space (RKHS), and show it to be quite rich in the sense that a broad class of densities on R^d can be approximated arbitrarily well in Kullback-Leibler (KL) divergence by elements in the infinite dimensional family, P. Motivated by this approximation property, we consider the problem of estimating an unknown density p_0, through an element in P. Standard techniques like maximum likelihood estimation (MLE) or pseudo MLE (based on the method of sieves), which are based on minimizing the KL divergence between p_0 and P, do not yield practically useful estimators because of their inability to efficiently handle the log-partition function. We propose an estimator based on minimizing the Fisher divergence between p_0 and P, which involves solving a simple finite-dimensional linear system. We show the proposed estimator to be consistent, and provide convergence rates under a smoothness assumption that log(p_0) belongs to the image of the fractional power of a Hilbert-Schmidt operator defined on RKHS. Through numerical simulations we demonstrate that the proposed estimator outperforms the non-parametric kernel density estimator, and that the advantage of the proposed estimator grows with increasing dimension.


 

6 May 2016

Zhou_ZhouZhou Zhou

University of Toronto

Inference for Non-stationary Time Series Regression with Inequality Constraints

Abstract: We consider statistical inference for time series linear regression where the response and predictor processes may experience general forms of abrupt and smooth non-stationary behaviours over time. Meanwhile, the regression parameters are subject to linear inequality constraints. A simple and unified procedure for structural stability check and parameter inference is proposed. The proposed methodology is shown to be consistent whether or not the true regression parameters are on the boundary of the restricted parameter space via utilizing an asymptotically invariant geometric property of polyhedral cones.


 

27 May 2016

danielvogelDaniel Vogel

University of Aberdeen

Change-point tests based on U-statistics and U-quantiles

Abstract: Classical change-point tests are based on normal likelihood estimators. These are generally in-efficient for heavy-tailed data, which in many areas is the norm rather than the exception. An appealing way of constructing improved tests is to use nonparametric or robust estimators, which show no loss under heavy tails. In this talk we study the theoretical foundations for a large class of such estimators: we prove functional limit theorems for U-statistics and U-quantiles of weakly dependent data. Statistics like Gini’s mean difference, Kendall’s tau (U-statistics), the Hodges–Lehmann estimator or the Qn scale estimator (U-quantiles) are efficient at the normal distribution and at the same time robust against heavy tails. Except for Gini’s mean difference, they are asymptotically normal regardless of the finiteness of any moments. We introduce the notion of near epoch dependence in probability (PNED), a very general weak dependence condition, which, in contrast to the traditional L2 NED, does not require the existence of any moments. We study in detail a change-point test based on the Hodges–Lehmann estimator as an alternative to the classical CUSUM test for location.

 


 3 June 2016

kaplan5David Kaplan

Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models

University of Wisconsin - Madison

Abstract: This talk examines Bayesian model averaging as a means of addressing predictive performancein Bayesian structural equation models. The current approach to addressing theproblem of model uncertainty lies in the method of Bayesian model averaging. We expandthe work of Madigan and his colleagues by considering a structural equation model as aspecial case of a directed acyclic graph. We then provide an algorithm that searches the modelspace for submodels and obtains a weighted average of the submodels using posterior modelprobabilities as weights. Our simulation study provides a frequentist evaluation of ourBayesian model averaging approach and indicates that when the true model is known,Bayesian model averaging does not yield necessarily better predictive performance comparedto nonaveraged models. However, our case study using data from an international large-scaleassessment reveals that the model-averaged submodels provide better posterior predictiveperformance compared to the initially specified model.

 

 

 

 

 

Share:Facebook|Twitter|LinkedIn|
stats-seminar1