
Statistics Seminar Series 201314
The Department of Statistics hosts seminars throughout the year. Seminars take place on Friday afternoons at 2pm, unless otherwise stated, in the Leverhulme Library (COL 6.15). All are very welcome to attend and refreshments are provided. Please contact Events for further information about any of these seminars.
Statistics Seminar Series
29th
May
2014

Ryan Tibshirani (Carnegie Mellon University)
Title: Adaptive piecewise polynomial estimation via trend filtering
Abstract: We discuss trend filtering, a recently proposed tool of Kim et al. (2009) for nonparametric regression. The trend filtering estimate is defined as the minimizer of a penalized least squares criterion, in which the penalty term sums the absolute kth order discrete derivatives over the input points. Perhaps not surprisingly, trend filtering estimates appear to have the structure of kth degree spline functions, with adaptively chosen knot points (we say "appear" here as trend filtering estimates are not really functions over continuous domains, and are only defined over the discrete set of inputs). This brings to mind comparisons to other nonparametric regression tools that also produce adaptive splines; in particular, we compare trend filtering to smoothing splines, which penalize the sum of squared derivatives across input points, and to locally adaptive regression splines (Mammen & van de Geer 1997), which penalize the total variation of the kth derivative.
Empirically, trend filtering estimates adapt to the local level of smoothness much better than smoothing splines, and further, they exhibit a remarkable similarity to locally adaptive regression splines. Theoretically, (suitably tuned) trend filtering estimates converge to the true underlying function at the minimax rate over the class of functions whose kth derivative is of bounded variation. The proof of this result follows from an asymptotic pairing of trend filtering and locally adaptive regression splines, which have already been shown to converge at the minimax rate (Mammen & van de Geer 1997). At the core of this argument is a new result tying together the fitted values of two lasso problems that share the same outcome vector, but have different predictor matrices.

21st
March
2014

Jochen Einbeck (Durham University)
Title: Localized principal components and curves
Abstract: We investigate properties and applications of localized principal components, where localization is achieved through the use of kernels as weight functions. In particular, we provide an asymptotic approximation (assuming large sample size and small bandwidths) of the first localized principal component at any given point, which turns out to depend only on the bandwidth parameter(s) and the density at that point. This result is extended to the context of local principal curves, where the characteristics of the points at which the curve stops at the boundaries are identified. This is used to provide a method which allows the curve to proceed beyond its natural endpoint if desired. Finally, we also consider the possibility of localizing PCA wrt. to external variables such as time

7th March 2014

Oliver Ratmann (Imperial College)
Title: Statistical modelling of summary values leads to accurate Approximate Bayesian Computations
Abstract: Approximate Bayesian Computations (ABC) are considered to be noisy even when sufficient statistics are available or when approximately sufficient statistics can be constructed. We present a rigorous theoretical framework, referred to as ABC*, under which ABC is set up such that the mode of the true posterior density is estimated exactly and such that the KullbackLeibler divergence of the ABC approximation to the true posterior density is very small. The main idea is to construct  through statistical modelling of socalled summary values  an appropriate parametric probability space on which the ABC approximation can be controlled through statistical decision theory. ABC* fully specifies which test statistics to use, how to combine them, how to set the tolerances and how long to simulate. The approximation error due to the tolerances is always controlled. The ABC* regularity conditions on the summary values are relatively strong, so that in practice some approximation error remains. Several examples and an application to time series data of influenza A (H3N2) infections in the Netherlands illustrate ABC* in action and explore its limitations.
(Paper can be downloaded from http://arxiv.org/abs/1305.4283).

7th February 2014

Yves Rosseel (Ghent University)
Title: Iavaan: an R Package for structural equation modelling
Abstract: Structural equation modeling (SEM) is a vast field and widely used by many applied researchers in the social and behavioral sciences. Over the years, many software packages for structural equation modeling have been developed, both free and commercial. However, perhaps the best stateoftheart software packages in this field are still closedsource and/or commercial. The R package `lavaan' has been developed to provide applied researchers, teachers, and statisticians, a free, fully opensource, but commercialquality package for latent variable modeling. In this presentation, I will explain the aims behind the development of the package, give an overview of its most important features, and provide some examples to illustrate how lavaan works in practice. Finally, I will discuss how lavaan attempts to capture the (computational) history of SEM, and how preparations are made to shape the future of SEM.

24th January 2014

Daniel Oberski (Tilburg University)
Title: A measure to evaluate model fit by sensitivity analysis
Abstract: T Latent variable models involve restrictions on the data that can be formulated in terms of "misspecifications": restrictions with a modelbased meaning. Examples include zero crossloadings and local dependencies, as well as “measurement invariance” or “differential item functioning”. If incorrect, misspecifications can potentially disturb the main purpose of the latent variable analysis—seriously so in some cases.
I propose to evaluate whether a particular analysis at hand is such a case or not. To do this, I define a measure based on the likelihood of the restricted model that approximates the change in the parameters of interest if the misspecification were freed, the EPCinterest. The main idea is to examine the EPCinterest and free those misspecifications that are important” while ignoring those that are not. I have implemented the EPCinterest in the lavaan software for structural equation modeling and the Latent Gold software for latent class analysis. This approach can resolve several problems and inconsistencies in the current practice of model fit evaluation used in latent variable analysis, something I illustrate using analyses from the “measurement invariance” literature and from item response theory.
References
Preprints of the papers can be found at http://daob.nl/publications

13th December 2013

Wei Gao (Northeast Normal University)
Title: Proposed Estimators for Dynamic and Static Probit Models with Panel Data
Abstract: When one deals with discrete panel data, latent variable models with individual effects are often introduced. Except for the case where the latent variable given individual effects is logistically distributed, one usually assumes that the individual effect has some distribution with unknown parameters. When observations for each individual are small, the analyzed results are sensitive to the chosen distribution of individual effects. In this paper, new statistics are proposed for dynamic and static probit models with panel data. Simulation studies show that the proposed statistics work well.

22nd November 2013

Junichi Hirukawa (Niigata University)
Title: Locally stationary processes in time series analysis
Abstract: The theory of time series analysis has been well established under the assumption of stationarity. However, it seems to be restrictive that the spectral structure of time series does not change all the time. When we deal with nonstationary processes, one of the difficult problems to solve is how to set up an adequate asymptotic theory. To meet this Dahlhaus introduced an important class of nonstationary processes, called locally stationary processes, with rigorous asymptotic framework. In this talk we deal with some topic concerning with locally stationary processes, e.g. locally stationary factor model.

15th November 2013

Lajos Gergely Gyurko (University of Oxford)
Title: Extracting information from the signature of a data stream
Abstract: Market events such as order placement and order cancellation are examples of the complex and substantial flow of data that surrounds a modern financial engineer. New mathematical techniques, developed to describe the interactions of complex oscillatory systems (known as the theory of rough paths) provide new tools for analysing and describing these data streams and extracting the vital information. In this talk we illustrate how a very small number of coefficients obtained from the signature of financial data can be sufficient to classify this data for subtle underlying features and make useful predictions.
This talk presents financial examples in which we learn from data and then proceed to classify fresh streams. The classification is based on features of streams that are specified through the coordinates of the signature of the path. At a mathematical level the signature is a faithful transform of a multidimensional time series (ref.: Ben Hambly and Terry Lyons 2010). Hao Ni,Terry Lyons and Daniel Levin (2013) introduced the possibility of its use to understand financial data and pointed to the potential this approach has for machine learning and prediction. We evaluate and refine these theoretical suggestions against practical examples of interest and present a few motivating experiments which demonstrate information the signature can easily capture in a nonparametric way avoiding traditional statistical modelling of the data.

1st November 2013

Ioannis Kosmidis (UCL)
Title: Shrinking bias to benefit estimation and inference in statistical models
Abstract: In this talk we present some recent work on a unified computational and conceptual framework for reducing the bias in the estimation of statistical models from a practitioners point of view. The talk will discuss several of the shortcomings of classical estimators (like the MLE) with demonstrations based on real and artificial data, for several wellused statistical models including Binomial and categorical responses models (for both nominal and ordinal responses) and Beta regression. The main focus will be on how those shortcomings can be overcome by reducing bias. A generic algorithm of easy implementation for reducing the bias in any statistical model will also be presented along with specific purpose algorithms that take advantage of specific model structures.

18th October 2013

Stephen Jenkins (LSE)
Title: Regression Analysis of Country Effects using Multilevel Data
Abstract: Country effects' on outcomes for individuals are often analysed using multilevel (hierarchical) models applied to harmonised multicountry datasets such as ESS, EUSILC, EVS, ISSP, and SHARE. We point out problems with the assessment of country effects that appear not to be widely appreciated by social science researchers, and develop our arguments using MonteCarlo simulation analysis of linear and logit mixed models. With large sample sizes of individuals within each country but only a small number of countries, analysts can reliably estimate individuallevel effects but estimates of parameters summarising country effects are likely to be unreliable. Multilevel modelling methods are no panacea.
(The talk is based on an ISER Working Paper downloadable from: https://www.iser.essex.ac.uk/publications/workingpapers/iser/201314.)


