Home > Department of Statistics > Research > Big Data Initiative > Big Data Initiative Seminar Series 2014-15

 

Department of Statistics

Columbia House

London School of Economics

Houghton Street

London

WC2A 2AE

 

General enquiries about events and seminars in the Department of Statistics

Email: statistics.events@lse.ac.uk

 

Enquiries about undergraduate and postgraduate course programmes in the Department of Statistics

 

Online query form

Frequently asked questions

 

BSc Queries

+44 (0)20 7955 7650

 

MSc Queries

+44 (0)20 7955 6879 

 

MPhil/PhD Queries

+44 (0)20 7955 7511
Email: i.marshall@lse.ac.uk (PhD and research enquiries)

bdi1


Big Data Initiative Seminar Series 2014-15

The Department of Statistics has set up a Big Data Initiative to coordinate activities on this topic, including organising a programme of big data themed seminars, starting in the 2014-15 academic session. Seminars will be published here as they are confirmed. 

 


PNultyFriday 14 November 2014, 2pm - 3pm, Room COL 6.15, Columbia House (sixth floor)
Maps and directions

Paul Nulty
LSE (Department of Methodology)

Title: Tools and methods for quantitative text analysis

Abstract: In this talk I present an overview of methods used for quantitative analysis of large text corpora. I begin by describing practical issues involved in using software to retrieve information from large text files, online text, and social media text streams. I discuss how text is transformed for quantitative analysis by extracting a word frequency matrix or other relevant features for machine learning, and describe software in development on the QUANTESS project to facilitate this process. Finally, I will discuss the statistical properties of natural language text, and present ongoing research on improving methods for extracting features from text for use with standard machine learning algorithms, with application to the scaling of political texts


Feng1Friday 28 November 2014, 2pm - 3pm, Room COL 6.15, Columbia House (sixth floor)
Maps and directions

Yang Feng
Columbia University

THIS SEMINAR HAS BEEN CANCELLED. FURTHER DETAILS WILL FOLLOW.

Title: Model selection in high-dimensional misspecified models

Abstract: Model selection is indispensable to high-dimensional sparse modelling in selecting the best set of covariates among a sequence of candidate models. Most existing work assumes implicitly that the model is correctly specified or of fixed dimensions. Yet model misspecification and high dimensionality are common in real applications. In this paper, we investigate two classical Kullback-Leibler divergence and Bayesian principles of model selection in the setting of high-dimensional misspecified models. Asymptotic expansions of these principles reveal that the effect of model misspecification is crucial and should be taken into account, leading to the generalized AIC and generalized BIC in high dimensions. With a natural choice of prior probabilities, we suggest the generalized BIC with prior probability which involves a logarithmic factor of the dimensionality in penalizing model complexity. We further establish the consistency of the covariance contrast matrix estimator in a general setting. Our results and new method are supported by numerical studies.


PeiQingFriday 6 February 2015, 12pm - 1pm, Room COL 6.15, Columbia House (sixth floor)
Maps and directions

Qing Pei
University of Hong Kong

Title: Nomadic destiny and mandate of heaven: a new perspective on the nomadic migration from environmental humanities

Abstract: The push force of the nomadic migration has been closely related with climate change, but to date there are insufficient evidences to prove it. Following the paradigm of Environmental Humanities, the study investigated the relationship between a 2000-year history of the nomadic migration and climate change in historical China. By using different statistical methods and a large amount of updated data, the study solved several unanswered questions from past research about the relationship between climate change and the nomadic migration, especially over the long term and on a large spatial scale. In addition, the nomadic migration is a key factor in influencing the alternating occupancy patterns of the country’s pastoral and agrarian polities. Therefore, the long-term cyclical patterns of China’s geopolitical shifts have been further explained based on the nomadic migration under the impact of climate change.


SedinovicFriday 6 February 2015 2pm - 3pm, Room COL 6.15, Columbia House (sixth floor)
Map and Directions

Dino Sejdinovic
University of Oxford

Title: Hypothesis Testing with Kernel Embeddings on Big and Interdependent Data

Abstract: Embeddings of probability distributions into a reproducing kernel Hilbert space provide a flexible framework for non-parametric hypothesis tests, including two-sample, independence, and three-variable (Lancaster) interaction tests. In practice, two main limitations of this methodology are that it generally requires time (at least) quadratic in the number of observations and that the test correctness heavily relies on observations being independent. We overview how these tests can be scaled up to large datasets using mini-batch procedures, resulting in consistent tests suited to data streams or to situations when the observations cannot be stored in memory. Kernel selection can also be performed on-the-fly in order to maximize the asymptotic efficiency of these tests. Furthermore, we show consistency of a wild bootstrap procedure for kernel-based tests on random processes, and demonstrate its use in the study of dependence between time series across multiple time lags.


DJHandFriday 20 February 2014, 2pm - 3pm Room CLM 3.02, Clement House (third floor)
(Sandwiches and refreshments available at 1pm)
Maps and Directions

David Hand
Imperial College London

Title: From Big Data to beyond Data: Extracting the Value

Abstract: We are inundated with messages about the promise offered by big data. Economic miracles, scientific breakthroughs, technological leaps appear to be merely a matter of taking advantage of a resource which is increasingly widely available. But is everything as straightforward as these promises seem to imply? I look at the history of big data, distinguish between different kinds of big data, and explore whether we really are at the start of a revolution. No new technology is achieved without effort and without overcoming obstacles, and I describe some such obstacles that lie in the path of realising the promise of big data.


Friday kolaczyk13 March 2015, 2pm-3pm Room COL 6.15, Columbia House
(sixth floor)
Map and Directions

Eric Kolaczyk
Boston University

Title: Statistical Analysis of Network Data in the Context of `Big Data': Large Networks and Many Networks

One of  the key challenges in the current era of `Big Data' is the ubiquity of  structured data, and one particularly prominent example of such data is network data.  In this talk we look at two of the ways that network data can be `big': in the sense of networks of many nodes, and in the sense of many networks.  Within this context,  I will present two vignettes showing how network versions of quite fundamental statistical problems remain yet to be addressed.

Specifically, I will touch on the problems (i) propagation of uncertainty to summary statistics of  `noisy' networks, and (ii) estimation and testing for large collections of network data objects.  In both cases I will present a formalization of a certain class of problems encountered frequently in practice, describe our work in addressing the core aspects of the problem, and point to some of the many outstanding challenges remaining.


SofiaOlhedeFriday 20 March 2015, 2pm - 3pm, Room COL 6.15, Columbia House (sixth floor)
Maps and Directions

Sofia Olhede
University College London

Title: Understanding Large Networks using Blockmodels

Abstract: Networks have become pervasive in practical applications. Understanding large networks is hard, especially because of a number of typical features present in such observations create a number of technical analysis challenges. I will discuss some basic network models that are tractable for analysis, what sampling properties they can reproduce, and some results relating to their inference.​ I will especially touch on the importance of the stochastic block model as an analysis tool.

This is joint work with Patrick Wolfe (UCL)


Share:Facebook|Twitter|LinkedIn|

Logos4a