Joint Statistics and Econometrics seminar series

Joint Statistics and Econometrics Seminar Series

Statistics takes the numbers that you have and summarises them into fewer numbers which are easily digestible by the human brain

This seminar series is a joint partnership with the STICERD Econometrics programme.

All joint Statistics and Econometrics seminars during Lent Term 2019 will take place from 12.00pm to 1.00pm and will be preceded by refreshments from 11.45am. Unless otherwise specified, the seminars will take place in COL 6.15 (Leverhulme Library), 6th Floor of Columbia House. 

Current seminars in Lent term 2019


Friday 29th March 2019, 12pm in COL 6.15 - Prof. Duo Qin from SOAS, University of London

Title: Let’s take the bias out of Econometrics 

Abstract: This study exposes the cognitive flaws of ‘endogeneity bias’. It examines how conceptualisation of the bias has evolved to embrace all major econometric problems, despite extensive lack of hard evidence. It reveals the crux of the bias – a priori rejection of causal variables as conditionally valid ones, and of the bias correction by consistent estimators – modification of those variables by non-uniquely and non-causally generated regressors. It traces the flaws to misconceptions about error terms and estimation consistency. It highlights the need to shake off the bias to let statistical learning play an active and formal role in econometrics.  

It's a paper recently published

Friday 22nd March 2019 12pm in - 32L.G.03  -  Victor Chernozhukov, MIT

Title: Double/debiased machine learning for treatment and structural parameters 

Abstract: We revisit the classic semi‐parametric problem of inference on a low‐dimensional parameter θ0 in the presence of high‐dimensional nuisance parameters η0. We depart from the classical setting by allowing for η0 to be so high‐dimensional that the traditional assumptions (e.g. Donsker properties) that limit complexity of the parameter space for this object break down. To estimate η0, we consider the use of statistical or machine learning (ML) methods, which are particularly well suited to estimation in modern, very high‐dimensional cases. ML methods perform well by employing regularization to reduce variance and trading off regularization bias with overfitting in practice. However, both regularization bias and overfitting in estimating η0 cause a heavy bias in estimators of θ0that are obtained by naively plugging ML estimators of η0 into estimating equations for θ0. This bias results in the naive estimator failing to be urn:x-wiley:13684221:media:ectj12097:ectj12097-math-0001 consistent, where N is the sample size. We show that the impact of regularization bias and overfitting on estimation of the parameter of interest θ0 can be removed by using two simple, yet critical, ingredients: (1) using Neyman‐orthogonal moments/scores that have reduced sensitivity with respect to nuisance parameters to estimate θ0; (2) making use of cross‐fitting, which provides an efficient form of data‐splitting. We call the resulting set of methods double or debiased ML (DML). We verify that DML delivers point estimators that concentrate in an urn:x-wiley:13684221:media:ectj12097:ectj12097-math-0002‐neighbourhood of the true parameter values and are approximately unbiased and normally distributed, which allows construction of valid confidence statements. The generic statistical theory of DML is elementary and simultaneously relies on only weak theoretical requirements, which will admit the use of a broad array of modern ML methods for estimating the nuisance parameters, such as random forests, lasso, ridge, deep neural nets, boosted trees, and various hybrids and ensembles of these methods. We illustrate the general theory by applying it to provide theoretical properties of the following: DML applied to learn the main regression parameter in a partially linear regression model; DML applied to learn the coefficient on an endogenous variable in a partially linear instrumental variables model; DML applied to learn the average treatment effect and the average treatment effect on the treated under unconfoundedness; DML applied to learn the local average treatment effect in an instrumental variables setting. In addition to these theoretical applications, we also illustrate the use of DML in three empirical examples.

Friday 15th March 2019 12pm in COL 6.15 - Degui Li, University of York

Title: Nonparametric Homogeneity Pursuit in Functional-Coefficient Models

Abstract: This paper explores the homogeneity of coefficient functions in nonlinear models with functional coefficients and identifies the underlying semiparametric modelling structure. With initial kernel estimates of coefficient functions, we combine the classic hierarchical clustering method with a generalised version of the information criterion to estimate the number of clusters, each of which has a common functional coefficient, and determine the membership of each cluster. To identify a possible semi-varying coefficient modelling framework, we further introduce a penalised local least squares method to determine zero coefficients, non-zero constant coefficients and functional coefficients which vary with an index variable. Through the nonparametric kernel-based cluster analysis and the penalised approach, we can substantially reduce the number of unknown parametric and nonparametric components in the models, thereby achieving the aim of dimension reduction. Under some regularity conditions, we establish the asymptotic properties for the proposed methods including the consistency of the homogeneity pursuit. Numerical studies, including Monte-Carlo experiments and an empirical application, are given to demonstrate the finite-sample performance of our methods.

Friday 1st March 2019 12pm in COL 6.15 - Filipa Sa, Kings College London

Title: The Effect of University Fees on Applications, Attendance and Course Choice: Evidence from a Natural Experiment in the UK 

Abstract: Over the past two decades, large changes have been introduced to the level of university fees in the UK, with significant variation across countries. This paper exploits this variation to examine the effect of fees on university applications, attendance and course choice. It finds that applications decrease in response to higher fees with an elasticity of demand of about -0.4. Attendance also decreases. The reduction in applications and attendance is larger for courses with lower salaries and employment rates after graduation, for non-STEM subjects, and for less selective universities.

Friday 1st February 2019, 12pm in NAB 1.07 - Roger Koenker, UCL 

Title: Nonparametric maximum likelihood methods for binary response models with random coefficients

Abstract: Single index linear models for binary response with random coefficients have been extensively employed in many settings under various parametric specifications of the distribution of the random coefficients. Nonparametric maximum likelihood estimation (NPMLE) as proposed by Kiefer and Wolfowitz (1956) in contrast, has received less attention in applied work due primarily to computational difficulties. We propose a new approach to computation of NPMLEs for binary response models that significantly increase their computational tractability thereby facilitating greater flexibility in applications. Our approach, which relies on recent developments involving the geometry of hyperplane arrangements by Rada and Cerny (2018), is contrasted with the deconvolution method of Gautier and Kitamura (2013).

**Please note that this talk will be held in the New Academic Building room 1.07**


Past seminars in Lent term 2018 

23rd March 2018 - Marc Hallin from Université libre de Bruxelles

Marc Hallin is a Professor at Université libre de Bruxelles. His talk details are below and have a look at his webpage

Title: Optimal dimension reduction for vector and functional time series.

Abstract: Dimension reduction techniques are at the core of the statistical analysis of high-dimensional  observations. Whether  the data are vector- or function-valued,  principal component techniques, in this context,  play a central role. The  success of principal components in the dimension reduction problem is explained by the fact that, for any K<=p, the K first coefficients in the expansion of a p-dimensional random vector X in terms of its principal components is providing the
best linear K-dimensional summary of X in the mean square sense. This
optimality feature, however,  no longer holds true  in a time series context: principal components,  when the observations are serially dependent, are losing their optimal dimension reduction property to the so-called "dynamic principal components" introduced by Brillinger in 1981 in the vector case and, in the functional case, their functional extension  proposed by Hormann,  Kidzinski and Hallin (JRSS Ser.B, 2015). Principal components similarly are central tools in the
estimation of factor models: traditional principal components in the approach proposed by Stock and Watson (JASA 2002) or Bai and Ng (Econometrica 2002); dynamic ones for the Forni et al. (Review of Economics and Statistics 2000). The optimal  dimension reduction properties of dynamic principal components explain why the latter, in general, are more parsimonious, and perform better, under less
restrictive assumptions.

9th March 2018 - Majid Al-Sadoon from Universitat Pompeu Fabra

Majid Al-Sadoon is an Assistant Professor at Universitat Pompeu Fabra. His talk details are below and have a look at his webpage

Title: The Identification Problem for Linear Rational Expectations Models

Abstract: This paper considers the identification of stationary unique invertible solutions to linear rational expectations models from the spectral density matrix of the observable data. The paper provides an analytic characterisation of the sets of observationally equivalent models and presents necessary and sufficient conditions for identification that generalise classical results on the identification of ARMA models. (joint with Piotr Zwiernik, UPF)

23rd February 2018 - Daniel Pena from Universidad Carlos III de Madrid

Daniel Pena is a Professor at Universidad Carlos III de Madrid. His talk details are below and have a look at his webpage

Title: Forecasting Multiple Time Series with One-Sided Dynamic Principal Components

Abstract: We define one-sided dynamic principal components (ODPC) for time series as linear combinations of the present and past values of the series that minimize the reconstruction mean squared error. Previous definitions of dynamic principal components depend on past and future values of the series. For this reason, they are not appropriate for forecasting purposes. On the contrary, it is shown that the ODPC introduced in this paper can be successfully used for forecasting high-dimensional multiple time series. An alternating least squares algorithm to compute the proposed ODPC is presented. We prove that for stationary and ergodic time series the estimated values converge to their population analogues. We also prove that asymptotically, when both the number of series and the sample size go to infinity, if the data follows a dynamic factor model, the reconstruction obtained with ODPC converges, in mean squared error, to the common part of the factor model. Monte Carlo results shows that forecasts obtained by the ODPC compare favourably with other forecasting methods based on dynamic factor models.

26th January 2018 - Yingying Fan from University of Southern California

Yingying Fan is Associate Professor at University of Southern California. Her talk details are below and have a look at her webpage

Title: RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs

Abstract: Power and reproducibility are key to enabling refined scientific discoveries in contemporary big data applications with general high-dimensional nonlinear models. In this paper, we providetheoretical foundations on the power and robustness for the model-free knockoffs procedure introduced recently in Cand`es, Fan, Janson and Lv (2016) in high-dimensional setting when the ovariatedistribution is characterized by Gaussian graphical model. We establish that under mild regularity conditions, the power of the oracle knockoffs procedure with known covariate distribution inhigh-dimensional linear models is asymptotically one as sample size goes to infinity. When moving away from the ideal case, we suggest the modified model-free knockoffs method called graphical nonlinearknockoffs (RANK) to accommodate the unknown covariate distribution. We provide theoretical justifications on the robustness of our modified procedure by showing that the false discovery rate (FDR) isasymptotically controlled at the target level and the power is asymptotically one with the estimated covariate distribution. To the best of our knowledge, this is the first formal theoretical result onthe power for the knock- offs procedure. Simulation results demonstrate that compared to existing approaches, our method performs competitively in both FDR control and power. A real data set isanalyzed to further assess the performance of the suggested knockoffs procedure. 

12th January 2018 in CLM.3.02 - Carsten Jentsch from Universität Mannheim 

Carsten Jentsch is a Professor at Universität Mannheim. His talk details are below and take a look at his webpage

Title: Statistical inference on party positions from texts: statistical modeling, bootstrap and adjusting for time effects 

Abstract: One central task in comparative politics is to locate party positions in a certain political space. For this purpose, several empirical methods have been proposed using text as data sources. In general, theanalysis of texts to extract information is a difficult task. Its data structure is very complex and political texts usually contain a large number of words such that a simultaneous analysis of word countsbecomes challenging. In this paper, we consider Poisson models for each word count simultaneously and provide a statistical analysis suitable for political text data. In particular, we allow formulti-dimensional party positions and develop a data-driven way of determining the dimension of positions. Allowing for multi-dimensional political positions gives new insights in the evolution of partypositions and helps our understanding of a political system. Additionally, we consider a novel model which allows the political lexicon to change over time and develop an estimation procedure basedon LASSO and fused LASSO penalization techniques to address high-dimensionality via significant dimension reduction. The latter model extension gives more insights into the potentially changing useof words by left and right-wing parties over time. Furthermore, the procedure is capable to identify automatically words having a discriminating effect between party positions. To address thepotential dependence structure of the word counts over time, we included integer-valued time series processes into our modeling approach and we implemented a suitable bootstrap method to constructconfidence intervals for the model parameters. We apply our approach to party manifesto data from German parties over all seven federal elections after German reunification. The approach is simplyimplemented as it does not require any a priori information (from external source) nor expert knowledge to process the data. The data studies confirm that our procedure is robust, runs stable and leads tomeaningful and interpretable results. 

*Please note that this seminar takes place in CLM.3.02, 3rd floor of Clement House instead of the Leverhulme Library*

During Michaelmas term, they take place on a Friday at 12-1pm in 32L.LG.03 or (Lower Ground Floor, LSE, 32 Lincoln's Inn Fields, London, WC2A 3PH) unless otherwise stated.

Past seminars in Michaelmas term 2017  

Please have a look at the STICERD website for details on the past seminars of MT 2017. 

Past MT seminars in 2016

Javier Hidalgo (LSE)

9th December 2016 - 32L.LG.03

Speaker - Javier Hidalgo (LSE)

Title - TBC

Inference on trending panel data

2nd December 2016 - 32L.LG.03

Speaker - Peter Robinson (LSE)

TitleInference on trending panel data

Misspecification testing in spatial autoregressive models

25th November 2016 - 32L.LG.03

Speaker - Yungyoon Lee (Royal Holloway, University of London) 

TitleMisspecification testing in spatial autoregressive models

Nonseparable unobserved hetereogeneity and partial identificaion in IV models for count outcomes

18th November 2016 - 32L.LG.03 

Speaker - Dongwoo Kim (UCL)

TitleNonseparable unobserved hetereogeneity and partial identificaion in IV models for count outcomes

A new approach for dynamic event count data models

11th November 2016 - 32L.LG.03 

Speaker - Namhyun Kim (Exeter University)

TitleA new approach for dynamic event count data models   

Patrick Wongsa-Art (Newcastle University)

10th November 2016 - 32L.LG.03 

Speaker - Patrick Wongsa-Art (Newcastle University)

Title - TBC

Partial independence in nonseparable models

4th November 2016 -  32L.LG.03 

Speaker - Matt Masden (Duke University), joint with Alexandre Poirier

TitlePartial independence in nonseparable models
Download the paper

Quantile methods for first-price auctions: a signal approach

7th October 2016 - 32L.LG.03 

Speaker - Emmanuel Guerre (QMW), joint with Nathalie Gimenes

TitleQuantile methods for first-price auctions: a signal approach

Optimal two-sided tests for instrumental variables regression with heteroskedastic and autocorrelated errors

30th September 2016 - 32L.LG.03 

Speaker - Marcelo Moreira (Fundação Getúlio Vargas (FGV/EPGE)), joint with Humberto Moreira.

TitleOptimal two-sided Tests for instrumental variables regression with heteroskedastic and autocorrelated errors.

Download the paper

Past LT seminars in 2017 

The uncertainty of principal components in dynamic factor models

Esther Ruiz

Universidad Carlos III

24th March 2017 - 12-1pm in the Leverhulme Library COL.6.15

Title - The uncertainty of principal components in dynamic factor models

Abstract - Dynamic Factor Models (DFM) are often fitted to large systems of multivariate time series to represent the evolution of underlying factors. Given that these factors are usually unobserved, to correctly interpret their estimated counterparts, one needs a measure of their uncertainty. In the context of very large systems of economic and financial variables, it is popular to extract factors using the computationally easy although non-efficient Principal Components (PC) procedure.

The asymptotic distribution of factors extracted by PC is known. However, for the sample sizes and cross-sectional dimensions usually encountered in practice, the asymptotic distribution is not an appropriate approximation to the finite sample one. We propose using bootstrap procedures to approximate the finite sample distribution of the factors extracted by PC to have a realistic picture of their associated uncertainty.

The finite sample properties of the proposed procedure are analyzed and compared with those of the asymptotic distribution and alternative bootstrap procedures previously proposed in the context of DFM. The results are empirically illustrated obtaining confidence intervals of the underlying factor in a system of Spanish macroeconomic variables and in a system of in house process of advanced and emerging markets. Joint work with Javier de Vicente. 

Sequential testing for structural stability in approximate factor models

Lorenzo Trapani

Cass Business School

10th March 2017 - 12-1pm in the Leverhulme Library COL.6.15

Title - Sequential testing for structural stability in approximate factor models

Abstract - We develop a a family of monitoring procedures to detect a change in a large factor model. Our statistics are based on the following property of the (r+1)-th eigenvalue of the sample covariance matrix of the data: whilst under the null the (r+1)-th eigenvalue is bounded, under the alternative of a change (either in the loadings, or in the number of factors itself) it becomes spiked. Given that the sample eigenvalue does not have a known limiting distribution under the null, we regularise the problem by randomising the test statistic in conjunction with sample conditioning, obtaining a sequence of i.i.d., asymptotically chi-squared statistics which are then employed to build the monitoring scheme. Numerical evidence shows that our procedure works very well in finite samples, with a very small probability of false detections and tight detection times in presence of a genuine change point. Joint with Matteo Barigozzi.

Detection of periodicity in functional time series

Siegfried Hörmann

Université libre de Bruxelles

24th February 2017 - 12-1pm in the Leverhulme Library COL.6.15

Title - Detection of periodicity in functional time series

Abstract - Periodicity is one of the most important characteristics of time series, and tests for periodicity go back to the very origins of the eld. The importance of such tests has manifold reason. One of them is that most inferential pro-cedures require that the series be stationary, but classical stationarity tests (as e.g. KPSS procedures) have little power against a periodic component inthe mean.

In this account we respond to the need to develop periodicity tests for functional time series (FTS). Examples of FTS's include annual temperature or smoothed precipitation curves, daily pollution level curves, various daily curves derived from high frequency asset price data, daily bond yield curves, daily vehicle trac curves and many others. One of the important contributions of this article is the development of a fully functional ANOVA test for stationary data. If the functional time series (Yt) satises a certain weak-dependence condition, then, using a fre- quency domain approach, we obtain the asymptotic null-distribution (for the constant mean hypothesis) of the functional ANOVA statistic.

The limiting distribution has an interesting form and can be written as a sum of independent hypoexponential variables whose parameters are eigenvalues of the spectral density operator of (Yt). To the best of our knowledge, there exists no comparable asymptotic result in FDA literature. Adapting ANOVA for dependence is one way to conduct periodicity analysis. It is suitable when the periodic component has no particular form. If, however, the alternative is more specic or the period is large then we can construct simpler and more powerful tests. We hence introduce three dif- ferent models with increasing complexity and develop the appropriate test statistics.

The power-advantage will be illustrated in simulations and by a theoretical case study where we consider local consistency results for three specic alternatives.A common approach to inference for functional data is to project obser- vations onto a low dimensional basis system and then to apply a suitable multivariate procedure to the vector of projections. This approach will also be explained and discussed.

The talk is based on joint work with Piotr Kokoszka (Colorado State University) and Gilles Nisol (ULB). 

Testing uniformity on high-dimensional spheres against monotone rotationally symmetric alternatives

Davy Paindaveine

Université libre de Bruxelles

27th January 2017 - 12-1pm in the Leverhulme Library COL.6.15

Title - Testing uniformity on high-dimensional spheres against monotone rotationally symmetric alternatives

Abstract - We consider the problem of testing uniformity on high-dimensional unit spheres. We are primarily interested in non-null issues. We show that rotationally symmetric alternatives lead to two Local Asymptotic Normality (LAN) structures.

The first one is for fixed modal location θ and allows to derive locally asymptotically most powerful tests under specified θ. The second one, that addresses the Fisher–von Mises–Langevin (FvML) case, relates to the unspecified-θ problem and shows that the high-dimensional Rayleigh test is locally asymptotically most powerful invariant. Under mild assumptions, we derive the asymptotic non-null distribution of this test, which allows to extend away from the FvML case the asymptotic powers obtained there from Le Cam’s third lemma.

Throughout, we allow the dimension p to go to infinity in an arbitrary way as a function of the sample size n. Some of our results also strengthen the local optimality properties of the Rayleigh test in low dimensions. We perform a Monte Carlo study to illustrate our asymptotic results. Finally, we treat an application related to testing for sphericity in high dimensions.

Joint work with Christine Cutting and Thomas Verdebout.

Some recent progress on nonlinear spatial modelling: a personal review

Zudi Lu

University of Southampton

13th January 2017 - 12-1pm in the Leverhulme Library COL.6.15

Title - Some recent progress on nonlinear spatial modelling: A personal review

Abstract - Larger amounts of spatial or spatiotemporal data with more complex structures collected at irregularly spaced sampling locations are prevalent in a wide range of disciplines. With few exceptions, however, practical statistical methods for nonlinear modeling and analysis of such data remain elusive. In this talk, I provide a review on some developments and progress of the research that my co-authors and I have recently done.

In particular, we will look at some nonparametric methods for probability, including joint, density estimation, and semiparametric models for a class of spatio-temporal nonlinear regression permitting possibly nonlinear relationship between response and covariates, with location-dependent spatial neighbouring and temporal lag effects taken account of. In the setting of semiparametric spatiotemporal modelling, a computationally feasible data-driven method is also shown for spatial weight matrix estimation. For illustration, our methodology is applied to investigate some land and housing prices data sets.

Bootstrap inference under random distributional limits

Giuseppe Cavaliere

University of Bologna

13th January 2017 - 12-1pm in the Leverhulme Library COL.6.15

Title - Bootstrap inference under random distributional limits

Abstract - Asymptotic bootstrap validity is usually understood and established as consistency of the distribution of a bootstrap statistic, conditional on the data, for the unconditional limit distribution of a statistic of interest. Cases where the limit measure induced by the bootstrap is random are therefore regarded as cases where bootstrap inference is invalid.

However, apart from possessing at most one unconditional limit distribution under a fixed asymptotic scheme, a statistic in general may possess a host of conditional (random) limit distributions, depending on the choice of the conditioning sets. We discuss the appropriate probabilistic tools for establishing asymptotic bootstrap validity, in terms of asymptotic distributional uniformity of bootstrap p-values, in the case where the distribution of the bootstrap statistic conditional on the data estimates consistently a conditional limit distribution of a statistic, in a sense weaker than the usual weak convergence in probability.

We provide two general sufficient conditions for bootstrap validity in cases where weak convergence in probability fails. Finally, we apply our result to tests of parameter constancy in a general regression model based, providing a rigorous analysis of the validity of inference based on the fixed regressor bootstrap. 

Joint work with Iliyan Georgiev.