Home > Department of Statistics > Events > Past Seminars > 2014-15 Seminar Series > Statistics Seminar Series 2014-15

Department of Statistics

Columbia House

London School of Economics

Houghton Street

London

WC2A 2AE

General enquiries about events and seminars in the Department of Statistics

BSc Queries

+44 (0)20 7955 7650

MSc Queries

+44 (0)20 7955 6879

MPhil/PhD Queries

+44 (0)20 7955 751

# Statistics Seminar Series 2014-15

The Department of Statistics hosts statistics seminars throughout the year. Seminars take place on Friday afternoons at 2pm, unless otherwise stated, in the Leverhulme Library (COL 6.15, Columbia House). All are very welcome to attend. Please contact Events for further information about any of these seminars

Details of the 2014-15 Statistics Seminar Series will be published here as they are confirmed.

Friday 17 October 2014, 2pm - 3pm, Room COL 6.15, Columbia House (sixth floor)
Maps and directions

George Ploubidis
Institute of Education, University of London

Title: Psychological distress in mid-life in 1958 and 1970 cohorts: the role of childhood experiences and behavioural adjustment

Abstract:  This paper addresses the levels of psychological distress experienced in mid-life (age 42) by men and women born in 1958 and 1970, using two well known population based UK birth cohorts (NCDS and BCS70). Our aim was to empirically test whether psychological distress has increased, and if so whether this increase can be explained by differences between the cohorts in their childhood conditions (including birth and parental characteristics), as well as differences in their social and emotional adjustment during adolescence. The measurement equivalence of psychological distress between the two cohorts was formally established using methods within the generalised latent variable modelling framework. The potential role of childhood conditions, social and behavioural adjustment in explaining between cohort differences was investigated with modern causal mediation methods. Differences with respect to psychological distress between the NCDS and BCS70 cohorts at age 42 were observed, with the BCS70 being on average more psychologically distressed. These differences were more pronounced in men, with the magnitude of the effect being twice as strong compared to women. For both men and women it appears this effect is not due to the hypothesised factors in early life and adolescence, since these accounted for only 15% of the between cohort difference in men and 20% in women.

Friday 31 October 2014, 2pm - 3pm, Room COL 6.15, Columbia House (sixth floor)
Maps and directions

Lionel Truquet
Université de Rennes

Title: Statistical inference in semiparametric locally stationary ARCH models

Abstract:  In this work, we consider semiparametric versions of the univariate time-varying ARCH(p) model introduced by Dahlhaus & Subba Rao (2006) and studied by Fryzlewicz, Sapatinas and Subba Rao (2008). For a  given nonstationary data set, a natural question is to determine which coefficients capture the nonstationarity  and then which coefficients can be assumed to be non time-varying. For example, when the intercept is the  single time-varying coefficient, the resulting model is close to a multiplicative volatility model in the sense  of Engle & Rangel (2008) or Hafner and Linton (2010). Using kernel estimation, we will first explain how  to estimate the parametric and the nonparametric component of the volatility and how to obtain an asymptotically  efficient estimator of the parametric part when the noise is Gaussian. The problem of testing whether  some coefficients are constant or not is also addressed. In particular, our procedure can be used to test the  existence of a second-order dynamic in this nonstationary framework. Our methodology can be adapted to  more general linear regression models with time-varying coefficients, in the spirit of Zhang & Wu (2012).

References:
[1] Dahlhaus, R., Rao, S.S. Statistical inference for time-varying ARCH processes. The Annals of Statistics, 2006, Vol. 34, No. 3, 1075 - 1114.
[2] Engle, R. F., Rangel, J. G. The spline-GARCH model for low-frequency volatility and its global macroeconomic causes. Rev. Financ. Stud. (2008) 21 (3).
[3] Fryzlewicz, P., Sapatinas, T., Subba Rao S. Normalized least-squares estimation in time-varying ARCH models. The Annals of Statistics (2008), Vol. 36, No. 2, 742-786.
[4] Hafner, C. M., Linton, O. Efficient estimation of a multivariate multiplicative volatility model. Journal of Econometrics (2010), Vol. 159, Issue 1, 55-73.
[5] Zhang, T., Wu, W.B. Inference of time-varying regression models. The Annals of Statistics (2012), Vol.40, No. 3, 1376-1402.

Friday 14 November 2014, 2pm - 3pm, Room COL 6.15, Columbia House (sixth floor)
Maps and directions

Paul Nulty
LSE (Department of Methodology)

Title: Tools and Methods for Quantitative Text Analysis

Abstract: In this talk I present an overview of methods used for quantitative analysis of large text corpora. I begin by describing practical issues involved in using software to retrieve information from large text files, online text, and social media text streams. I discuss how text is transformed for quantitative analysis by extracting a word frequency matrix or other relevant features for machine learning, and describe software in development on the QUANTESS project to facilitate this process. Finally, I will discuss the statistical properties of natural language text, and present ongoing research on improving methods for extracting features from text for use with standard machine learning algorithms, with application to the scaling of political texts

Please also see the Big Data Initiative Seminar Series page

Friday 28 November 2014, 2pm - 3pm, Room COL 6.15, Columbia House (sixth floor)
Maps and directions

Yang Feng
Columbia University

THIS SEMINAR HAS BEEN CANCELLED.

Title: Model Selection in High-Dimensional Misspecified Models

Abstract: Model selection is indispensable to high-dimensional sparse modeling in selecting the best set of covariates among a sequence of candidate models. Most existing work assumes implicitly that the model is correctly specified or of fixed dimensions. Yet model misspecification and high dimensionality are common in real applications. In this paper, we investigate two classical Kullback-Leibler divergence and Bayesian principles of model selection in the setting of high-dimensional misspecified models. Asymptotic expansions of these principles reveal that the effect of model misspecification is crucial and should be taken into account, leading to the generalized AIC and generalized BIC in high dimensions. With a natural choice of prior probabilities, we suggest the generalized BIC with prior probability which involves a logarithmic factor of the dimensionality in penalizing model complexity. We further establish the consistency of the covariance contrast matrix estimator in a general setting. Our results and new method are supported by numerical studies.

Friday 12 December 2014, 2pm - 3pm, Room COL 6.15, Columbia House (sixth floor)
Maps and directions

Title: Multiscale Bayes in density estimation

Abstract:  We present a nonparametric Bayesian analysis of the density estimation model with i.i.d. data on the unit interval. More specifically, using a multiscale approach, we derive results on convergence rates for the posterior distribution as well as limit theorems for functionals of the density, for certain families of prior distributions. We consider a few examples of such families, such as renormalized Gaussian processes and Polya tree priors.

Friday 16 January 2015 2pm-3pm, Room COL 6.15 Columbia House (sixth floor)
Map and Directions

Bernard Silverman
University of Oxford

Title: Science and mathematics in the Home Office

Abstract: I will describe my role and work as Chief Scientific Adviser in the Home Office, and describe a range of examples where mathematics and science have a demonstrable impact on policy, with a focus on areas where statistical thinking and expertise has been useful.   In Forensic Science alone, these range from Protection of Freedoms legislation about the retention of DNA profiles to an evaluation of the risks of new DNA profiling protocols.   My major illustrative example, however, will be the novel use of multiple systems estimation to gain insight into the scale of Modern Slavery in the UK, the way this has fed into the Government's Modern Slavery Strategy, and the wider science/policy issues this work presented.

Friday 6 February 2015 2pm - 3pm Room COL 6.15 Columbia House (sixth floor)
Map and Directions

Dino Sejdinovic
University of Oxford

Title: Hypothesis testing with Kernel embeddings on big and interdependent data

Abstract: Embeddings of probability distributions into a reproducing kernel Hilbert space provide a flexible framework for non-parametric hypothesis tests, including two-sample, independence, and three-variable (Lancaster) interaction tests. In practice, two main limitations of this methodology are that it generally requires time (at least) quadratic in the number of observations and that the test correctness heavily relies on observations being independent. We overview how these tests can be scaled up to large datasets using mini-batch procedures, resulting in consistent tests suited to data streams or to situations when the observations cannot be stored in memory. Kernel selection can also be performed on-the-fly in order to maximize the asymptotic efficiency of these tests. Furthermore, we show consistency of a wild bootstrap procedure for kernel-based tests on random processes, and demonstrate its use in the study of dependence between time series across multiple time lags.

Friday 20 February 12pm - 1pm Room COL 6.15 Columbia House (sixth floor)
Map and Directions

Panagiotis Merkouris
Athens University of Economics and Business

Title: On best linear unbiased estimation and calibration in survey sampling

Abstract: A unified theory of optimal composite estimation in survey sampling settings involving combination of independent or correlated estimates from various survey sources can be formulated using the principle of best linear unbiased estimation. This applies to traditional survey designs involving data combination, such as multiple-frame and multi-phase sampling, and to various forms of combining data from independent or dependent samples with overlapping survey content, as in split-questionnaire designs, rotating panel surveys, non-nested double sampling and supplement surveys. An equivalent practical formulation of optimal composite estimation involving micro-integration of data from different samples is possible through a suitable calibration scheme for the sampling weights of the combined sample. The calibrated weights can be used to calculate weighted statistics, including totals, means, ratios, quantiles and regression coefficients. In particular, they give rise to composite estimators of population totals that are asymptotically best linear unbiased estimators. This unified approach to constructing optimal composite estimators through calibration will be illustrated with three distinct survey paradigms.

(Sandwiches and refreshments will be available in CLM 3.02, Clement House, at 1pm after the conclusion of this seminar)

Friday 20 February 2015 2pm - 3pm Room CLM 3.02 Clement House (third floor)
(Sandwiches and refreshments available at 1pm)
Map and directions

David Hand
Imperial College London

Title: From Big Data to Beyond Data: Extracting the Truth

Abstract: We are inundated with messages about the promise offered by big data. Economic miracles, scientific breakthroughs, technological leaps appear to be merely a matter of taking advantage of a resource which is increasingly widely available. But is everything as straightforward as these promises seem to imply? I look at the history of big data, distinguish between different kinds of big data, and explore whether we really are at the start of a revolution. No new technology is achieved without effort and without overcoming obstacles, and I describe some such obstacles that lie in the path of realising the promise of big data.

Friday 6 March 2015 2pm - 3pm Room COL 6.15, Columbia House (sixth floor)
Map and Directions

Ruggero Bellio
University of Udine

Title: Likelihood-based inference with many nuisance parameters: Some recent developments

Abstract: We review frequentist inference on parameters of interest in models with many nuisance parameters, suitable for data with a stratified structure. In particular, two different likelihood-based methods  are illustrated. The first method is the  modified profile likelihood, where the nuisance parameters are removed through maximization.
The second method is  the integrated likelihood, where the nuisance parameters  are eliminated through integration, using a suitable weight function. The application  to some special settings is considered in some detail.
In particular, the focus is on  fixed-effects panel data models, small-sample  meta analysis, and item response theory models.

Friday 13 March 2015 2pm - 3pm Room COL 6.15, Columbia House (sixth floor)
Map and Directions

Eric Kolaczyk
Boston University

Title: Statistical Analysis of Network Data in the Context of Big Data': Large Networks and Many Networks

Abstract: One of the key challenges in the current era of Big Data' is the ubiquity of structured data, and one particularly prominent example of such data is network data. In this talk we look at two of the ways that network data can be big': in the sense of networks of many nodes, and in the sense of many networks. Within this context, I will present two vignettes showing how network versions of quite fundamental statistical problems remain yet to be addressed.

Specifically, I will touch on the problems (i) propagation of uncertainty to summary statistics of noisy' networks, and (ii) estimation and testing for large collections of network data objects. In both cases I will present a formalization of a certain class of problems encountered frequently in practice, describe our work in addressing the core aspects of the problem, and point to some of the many outstanding challenges remaining.

Friday 20 March 2015, 2pm - 3pm, Room COL 6.15, Columbia House (sixth floor)
Maps and directions

Sofia Olhede
University College London

Title: Understanding Large Networks using Blockmodels

Abstract: Networks have become pervasive in practical applications. Understanding large networks is hard, especially because of a number of typical features present in such observations create a number of technical analysis challenges. I will discuss some basic network models that are tractable for analysis, what sampling properties they can reproduce, and some results relating to their inference.​ I will especially touch on the importance of the stochastic block model as an analysis tool.

This is joint work with Patrick Wolfe (UCL)

Friday 8 May 2015, 2pm - 3pm, Room COL 6.15, Columbia House
(Sixth Floor)
Maps and Directions

Jinyuan Chang
University of Melbourne

Title: Simulation-based Hypothesis Testing of High Dimensional Means Under Covariance Heterogeneity – An Alternative Road to High Dimensional Tests

Abstract: Hypothesis testing for high-dimensional mean vectors has gained increasing attentions and stimulated innovative methodologies in statistics. In this paper, we introduce a fast computational simulation-based testing procedure which is adaptive to the covariance structure in the data for both one- and two-sample problems. The proposed procedures are based on maximum-type statistics and the critical values are computed via the Gaussian approximation. Different from most existing methods that rely on various regularity conditions on the covariance matrix, our method imposes no assumptions on the dependence structure of the underlying distributions. When testing against sparse alternatives, we suggest a pre-screening step to improve the power of the proposed tests. A data-driven procedure is proposed for practical implementations. Theoretical properties of the proposed one- and two-step testing procedures are investigated. Thorough numerical experiments on both synthetic and real datasets are provided to back up our theoretical results.

Friday 22 May 2015, 2pm-3pm, Room COL 6.15, Columbia House
(Sixth Floor)
Map and Directions

Ajay Jasra
National University of Singapore

Title: Multilevel Sequential Monte Carlo Samplers

Abstract: The approximation of expectations w.r.t. probability distributions associated to the solution of partial differential equations (PDEs) is considered herein; this scenario appears routinely in Bayesian inverse problems. In practice, one often has to solve the associated PDE numerically, using, for instance finite element methods and leading to a discretisation bias, with step-size level h_L. In addition, the expectation cannot be computed analytically and one often resorts to Monte Carlo methods. In the context of this problem, it is known that the introduction of the multi-level Monte Carlo (MLMC) method can reduce the amount of computational effort to estimate expectations, for a given level of error. This is achieved via a telescoping identity associated to a Monte Carlo approximation of a sequence of probability distributions with discretisation levels \infty>h_0>h_1\cdots>h_L. In many practical problems of interest, one cannot achieve an i.i.d. sampling of the associated sequence of probability distributions. A sequential Monte Carlo (SMC) version of the MLMC method is introduced to deal with this problem.  It is shown that under appropriate assumptions, the attractive  property of a reduction of the amount of computational effort to estimate expectations, for a given level of error, can be maintained in the SMC context. The approach is numerically illustrated on a Bayesian inverse problem. This is a joint work with Kody Law (KAUST), Raul Tempone (KAUST) and Alex Beskos (UCL).

Tuesday 26 May 2015, 2pm-3pm, Room COL 6.15, Columbia House
(Sixth Floor)
Please note that this seminar takes place on a Tuesday, as opposed to the usual Friday slot.
Map and Directions

Yang Feng
Columbia University

Title: A Conditional Dependence Measure with Applications to Undirected Graphical Models

Abstract: Measuring conditional dependence is an important topic in statistics with broad applications including graphical models. Under a factor model setting, a new conditional dependence measure is proposed. The measure is derived by using distance covariance after adjusting the common observable factors or covariates. The corresponding conditional independence test is given with the asymptotic null distribution unveiled. The latter gives a somewhat surprising result: the estimating errors in factor loading matrices, while of root-n order, do not have material impact on the asymptotic null distribution of the test statistic, which is also in the root−n domain. It is also shown that the new test has strict control over the asymptotic significance level and can be calculated efficiently. A generic method for building dependency graphs using the new test is elaborated. Numerical results and real data analysis show the superiority of the new method.​

Friday 29 May 2015, 2pm-3pm, Room COL 6.15, Columbia House
(Sixth Floor)
Map and Directions

Eva Petkova
New York University

Director of Biostatistics Division at the Department of Child and Adolescent Psychiatry
Associate Professor of Biostatistics
Child and Adolescent Psychiatry and Population Health, New York University Langone Medical Center, New York, NY

Title: Personalized Medicine and Generated Effect Modifiers

Abstract: Personalized medicine focuses on making treatment decisions for an individual patient based on her/his clinical, biological, behavioral and other data.  In contrast, for many years clinical trials have been performed to compare different treatments on average across some target population, e.g., individuals with depression.  All alone, clinicians have been aware that treatments do not work the same way for all patients, thus even if treatment A is better than treatment B on average, there might be patients who would do better on treatment B than on treatment A.  Because of that, in randomized clinical trials researchers not only compare the effect of treatments on average, but they also try to determine whether any patient characteristics have a different effect on the outcome, depending on the treatment. In regression models for the outcome, if there is a non-zero interaction between treatment and a baseline patient characteristic, that predictor is called an effect modifier.  Identification of such effect modifiers is crucial as we move towards personalized medicine, i.e., optimizing treatment assignment based on measurements made on a subject when s/he presents for treatment.  Recent years have seen rapidly growing interest in personalized medicine, both in clinical research and in statistical methodology. In clinical research, from a secondary goal of classic randomized clinical trials for establishing efficacy of an experimental treatment, finding patient characteristics that can inform which treatment would benefit which patient, has become the central aim of clinical research. There are already a number of studies where the primary goal is to identify biosignatures of treatment response, and the number of such studies is expected to increase in the coming years.  In the statistical literature, “personalized medicine” and “optimal treatment regime” continue to be intensely studied after they were first formalized by Murphy (2003) and Robins (2004).  A treatment decision is an algorithm that takes as input patient data (X) and outputs a (binary) treatment recommendation – 0 (give treatment A) or 1 (treatment B).  An optimal treatment decision would be one that maximizes the treatment benefit averaged over the entire target patient population. In this talk I will present a formal framework for optimal treatment decisions and will illustrate how statistical inferences can be made on different treatment decisions using large number of baseline scalar and functional patient characteristics collected in randomized clinical trials.
This is a joint work with Drs. T. Tarpey from Wright State University, R.T Ogden from Columbia University, A. Ciarleglio, B. Jiang and Z. Su from NYU

Share:|||