SEDS seminar

Past SEDS seminars

Past SEDS Seminars

Pursuing the UN Data Revolution for Sustainable Development
Speaker: Dr Viktoria Spaiser, University Academic Fellow in Political Science Informatics, University of Leeds
Date: 13 December 2018
Abstract: In August 2014 the UN established an Independent Expert Advisory Group to make concrete recommendations on bringing about a data revolution in sustainable development. The hope has been that data analytics would help to deal with the enormous challenge of achieving a sustainable development globally. But what does existing data actually tell us about the challenge and potential solutions? And what other data do we need in order to understand the problem in all its dimensions? Dr Viktoria Spaiser will focus on the global and the individual level of the sustainability challenge. She will discuss recent studies that she conducted with colleagues modelling the compatibility of the UN Sustainable Development Goals on the one hand and studying environmental behaviour in a field-experimental setup on the other hand. Looking globally into the empirically measurable conflict of various Sustainable Development Goals, Dr Spaiser will explain what cross-country time-series data tells us about the nature of the inconsistencies. In this context she will also discuss a recent extension of the original study, examining the different conclusions that can be drawn about the sustainability challenge depending on how the Sustainable Development Goals are operationalized. Specifically, she will show why it does matter whether we look at production-based or consumption-based CO2 emissions when pursuing the Sustainable Development Agenda. Dr Spaiser will then change the perspective and look at the sustainability challenge from an individual level angle. The core question here is: how can we encourage more pro-environmental behaviour? She will discuss a pilot study conducted recently, making use of smartphones to collect daily environmental behaviour data in a field-experimental setup. Dr Spaiser will conclude with a programmatic note on the road ahead in the sustainability research she is envisioning.

Combining Forecasts in the Presence of Ambiguity over Correlation Structures
Speaker: Professor Gilat Levy (joint with Professor Ronny Razin), Department of Economics, LSE
Date: 15 November 2018
Abstract: We suggest a framework to analyse how sophisticated decision makers combine multiple sources of information to form predictions. In particular, we focus on situations in which: 
(i) Decision makers understand each information source in isolation but are uncertain about the correlation between the sources; 
(ii) Decision makers consider a range of bounded correlation scenarios to yield a set of possible predictions. 
(iii) Decision makers face ambiguity in relation to the set of predictions they consider. 
In our model the set of predictions the decision makers considers is completely characterised by two parameters: the naïve interpretation of forecasts which ignores correlation, and the bound on the correlation between information sources that the decision maker considers. The analysis yields two countervailing effects on behaviour. 
First, when the naïve interpretation of information is relatively precise, it can induce risky behaviour, irrespective of what correlation scenario is chosen. Second, a higher correlation bound creates more uncertainty and therefore more conservative behaviour. We show how this trade-off affects behaviour in different applications, including financial investments and CDO ratings. We show that when faced with complex assets, decision makers are likely to behave in ways that are consistent with complete correlation neglect.

Tourism and Terrorism: The Impact of News Reporting
Speaker: Professor Sir Timothy Besley, School Professor of Economics of Political Science and W. Arthur Lewis Professor of Development Economis, Department of Economics,  LSE
Date: 25 October 2018
Abstract: This project looks at the relationship between violence and tourism. We first show that hotel prices based on information from hotel booking web sites respond negatively to violent incidents in tourist destinations. We then use monthly data on credit card spending from accounts in 120 countries in five tourist destinations from 2010 to 2017 to study the impact of violence on spending. To create a time-varying country-specific measure of reporting on violence, we use a machine-learning algorithm to explore a range of news sources in multiple languages to create a country-specific measure of reports of violence. We show that there is a strong relationship between news reporting and tourist spending in dyads. Moreover, country-level news coverage in countries where the credit card accounts are based has an impact over and above that of the violent incidents themselves.   

Electronic FX trading – where Game Theory meets Data Science
Speaker: Roel Oomen, Global co-head of electronic FX spot trading, Deutsche Bank, London
Date: 22nd February 2018
Abstract: In this talk, Roel will discuss recent developments in electronic FX trading and show how data science applied to dense (as opposed to big) financial data can be used to make practical decisions around execution optimisation. He will go over a number of case studies taken from a live trading environment.


Modelling Human Behaviour using Mobile Data
Speaker: Mirco Musolesi
Date: 1st February 2018
Abstract: We constantly generate digital traces in our online and offline lives, for example by using our smartphones, by interacting with everyday devices and the technological infrastructure of our cities or simply by posting content on online social media platforms. This information can be used to model and possibly predict human behaviour in real-time, at a scale and granularity that were unthinkable just a few years ago. In this talk, Mirco will present recent work in modelling human behaviour using these "digital traces” with a specific focus on mobile data. He will provide an overview of the methodological, algorithmic, and systems issues related to the development of solutions that rely on the online analysis and modelling of this type of data. As a case study, he will show how mobile phones can be used to collect and analyse mobility patterns of individuals in order to quantitatively understand how mental health problems affect their daily routines and behaviour and how potential changes can be automatically detected. He will demonstrate that it is possible to observe a non trivial correlation between mobility patterns and depressive mood using data collected by means of smartphones. Finally, he will also introduce his and his researchers' efforts in using cellular data for modelling mobility patterns of individuals at scale and their applications in the area of data for development.

Optimal Economic Design through Deep Learning
David C. Parkes, Paulson School of Engineering and Applied Sciences Harvard University
17 January 2018
Designing an auction that maximizes expected revenue is an intricate task. Despite major efforts, only the single-item case is fully understood. We explore the use of tools from deep learning on this topic. The design objective that we adopt is revenue optimal, dominant-strategy incentive compatible auctions. For a baseline, we show that multi-layer neural networks can learn almost-optimal auctions for a variety of settings for which there are analytical solutions, and even without leveraging characterization results. We also show that deep learning can be used to derive auctions for poorly understood problems, including settings with multiple items and budget constraints. Our research also demonstrates that the deep learning 

A Measure of Survey Mode Differences
Tuesday 5 December 2017
Jeff Gill, Department of Government, American University, Washington DC
Jeff will evaluate the effects of different survey modes on respondents' patterns of answers using an entropy measure of variability. While \emph{measures of centrality} show little differences between face-to-face and Internet surveys, he will find strong patterns of \emph{distributional differences} between these modes where Internet responses tend towards more diffuse positions due to lack of personal contact during the process and the social forces provided by that format. The results provide clear evidence that mode matters in modern survey research, and he will make recommendations for interpreting results from different modes.


Controlling Bias in Artificial Intelligence with Nisheeth Vishoi and Elisa Celis
Wednesday 29 November 2017
3:30 - 5pm 
Nisheeth Vishoi and Elisa Celis
Bias is an increasingly observed phenomenon in the world of artificial intelligence (AI) and machine learning: From gender bias in online search to racial bias in court bail pleas to biases in worldviews depicted in personalized newsfeeds. How are societal biases creeping into the seemingly “objective’’ world of computers and programs? At the core, what is powering today’s AI are algorithms for fundamental computational problems such as classification, data summarization, and online learning. Such algorithms have traditionally been designed with the goal of maximizing some notion of “utility” and identifying or controlling bias in their output has not been a consideration. In this talk, Nisheeth and Elisa will  explain the emergence of bias in algorithmic decision making and present the first steps towards developing a systematic framework to control biases in several of the aforementioned problems. This leads to new algorithms that have the ability to control and alleviate bias, often without a significant compromise to the utility that the current algorithms obtain.


Econometrics for Learning Agents with Vasilis Syrgkanis
Date: 23 November 2017
Speaker: Vasilis Syrgkanis, Microsoft
The traditional econometrics approach for inferring properties of strategic interactions that are not fully observable in the data, heavily relies on the assumption that the observed strategic behavior has settled at an equilibrium. This assumption is not robust in complex economic environments such as online markets where players are typically unaware of all the parameters of the game in which they are participating, but rather only learn their utility after taking an action. Behavioral models from online learning theory have recently emerged as an attractive alternative to the equilibrium assumption and have been extensively analyzed from a theoretical standpoint in the algorithmic game theory literature over the past decade. In this talk, Vasilis will present recent work, in which he takes a learning agent approach to econometrics, i.e. infer properties of the game, such as private valuations or efficiency of allocation, by only assuming that the observed repeated behavior is the outcome of a no-regret learning algorithm, rather than a static equilibrium. He will also present some empirical results from applying our methods to datasets from Microsoft’s sponsored search auction system.

Joint work with Denis Nekipelov, Eva Tardos and Yichen Wang.

On Elicitation Complexity
26th October 2017
Speaker: Ian Kash, Microsoft Research
Abstract: Elicitation is the study of statistics or properties which are computable via empirical risk minimization. This has applications in understanding which loss function to use in a regression for a particular statistic or finding a surrogate loss function which is easier to optimize.

While several recent papers have approached the general question of which properties are elicitable, we suggest that this is the wrong question---all properties are elicitable by first eliciting the entire distribution or data set, and thus the important question is how elicitable. Specifically, what is the minimum number of regression parameters needed to compute the property?

Building on previous work, we introduce a new notion of elicitation complexity and lay the foundations for a calculus of elicitation. We establish several general results and techniques for proving upper and lower bounds on elicitation complexity. These results provide tight bounds for eliciting the Bayes risk of any loss, a large class of properties which includes spectral risk measures and several new properties of interest.

Joint work with Rafael Frongillo.

Normalizing Digital Trace Data

Date: 19 October 2017
Speaker: Andreas Jungherr
Abstract: Over the last ten years, social scientists have found themselves confronting a massive increase in available data sources. In the debates on how to use these new data, the research potential of “digital trace data” has featured prominently. While various commentators expect digital trace data to create a “measurement revolution”, empirical work has fallen somewhat short of these grand expectations. In fact, empirical research based on digital trace data is largely limited by the prevalence of two central fallacies: First, the n=all fallacy; second, the mirror fallacy. As I will argue, these fallacies can be addressed by developing a measurement theory for the use of digital trace data. For this, researchers will have to test the consequences of variations in research designs, account for sample problems arising from digital trace data, and explicitly link signals identified in digital trace data to sophisticated conceptualizations of social phenomena. Below, I will outline the two fallacies in greater detail. Then, I will discuss their consequences with regard to three general areas in the work with digital trace data in the social sciences: digital ethnography, proxies, and hybrids. In these sections, I will present selected prominent studies predominantly from political communication research. I will close by a short assessment of the road ahead and how these fallacies might be constructively addressed by the systematic development of a measurement theory for the work with digital trace data in the social sciences.


Integrating Conflict Event Data

Thursday 4th May 2017
Karsten Donnay, University of Konstanz
The growing volume of sophisticated event-level data collection, with improving geographic and temporal coverage, offers prospects for conducting novel analyses. In instances where multiple related datasets are available, researchers tend to rely on one at a time, ignoring the potential value of the multiple datasets in providing more comprehensive, precise, and valid measurement of empirical phenomena. If multiple datasets are used, integration is typically limited to manual efforts for select cases. We develop the conceptual and methodological foundations for automated, transparent and reproducible integration and disambiguation of multiple event datasets. We formally present the methodology, validate it with synthetic test data, and demonstrate its application using conflict event data for Africa, drawing on four leading sources (UCDP-GED, ACLED, SCAD, GTD). We show that whether analyses rely on one or multiple datasets can affect substantive findings with regard to key explanatory variables, thus highlighting the critical importance of systematic data integration.


The STEM requirements of "non-STEM" jobs: evidence from UK online vacancy postings and implications for Skills & Knowledge Shortages

Date: Thursday 23rd March 2017
Speaker: Inna Grinis, PhD candidate in the Department of Economics, LSE
Abstract: Do employers in “non-STEM” occupations (e.g. Graphic Designers, Economists) seek to hire STEM (Science, Technology, Engineering, and Mathematics) graduates with a higher probability than non-STEM ones for knowledge and skills that they have acquired through their STEM education (e.g. “Microsoft C#”, “Systems Engineering”) and not simply for their problem solving and analytical abilities? This is an important question in the UK where less than half of STEM graduates work in STEM occupations and where this apparent leakage from the “STEM pipeline” is often considered as a wastage of resources. To address it, this paper goes beyond the discrete divide of occupations into STEM vs. non-STEM and measures STEM requirements at the level of jobs by examining the universe of UK online vacancy postings between 2012 and 2016. We design and evaluate machine learning algorithms that classify thousands of keywords collected from job adverts and millions of vacancies into STEM and non-STEM. 35% of all STEM jobs belong to non-STEM occupations and 15% of all postings in non-STEM occupations are STEM. Moreover, STEM jobs are associated with higher wages within both STEM and non-STEM occupations, even after controlling for detailed occupations, education, experience requirements, employers, etc. Although our results indicate that the STEM pipeline breakdown may be less problematic than typically thought, we also find that many of the STEM requirements of “non-STEM” jobs could be acquired with STEM training that is less advanced than a full time STEM education. Hence, a more efficient way of satisfying the STEM demand in non-STEM occupations could be to teach more STEM in non-STEM disciplines. We develop a simple abstract framework to show how this education policy could help reduce STEM shortages in both STEM and non-STEM occupations.

Read full paper  


The Case for Research Preregistration, with Applications in Elections Research

 Thursday 23rd February 2017
 Prof. Jamie Monogan, Department of Political Science, University of Georgia.
Preregistration refers to when an analyst commits to a research design before observing the outcome. How can preregistration be useful for political scientists? This presentation makes the
argument that, when appropriate, study registration increases honesty and transparency in research reporting in a way that benefits authors, reviewers, and readers. The essential element for preregistration to be useful is a clear public signal of the design before the data could possibly be observed, such as
before an experiment is conducted or before an election occurs. This presentation therefore offers illustrations of how to implement preregistration that focus on American elections. The three examples include: An analysis of the immigration issue in 2010 U.S. House of Representatives races, the effect of the 2011 debt ceiling controversy on the 2012 U.S. House elections, and a yet-to-be implemented design of how anxiety shaped individual voters' decision-making process in the 2016 U.S. presidential election.


Revealing the Anatomy of Vote Trading

Date: Thursday 9th February 2017
Speaker: Dr Omar Guerrero, Said Business School, University of Oxford.
Abstract: Cooperation in the form of vote trading, also known as logrolling, is central for law-making processes, shaping the development of democratic societies. Measuring vote trading is challenging because it happens behind closed doors. Hence, it is not directly observable. Empirical evidence of logrolling is scarce and limited to highly specific situations because existing methods are not easily applicable to broader contexts. We have developed a general and scalable methodology for revealing a network of vote traders, allowing us to measure logrolling on a large scale. Analysis on more than 9 million votes spanning 40 years in the U.S. Congress reveals a higher logrolling prevalence in the Senate and an overall decreasing trend over recent congresses, coincidental with high levels of political polarization. Our method is applicable in multiple contexts, shedding light on many aspects of logrolling and opening new doors in the study of hidden cooperation.


Fitting Hierarchical Models in Large-Scale Recommender Systems

Thursday 26 January 2017
Speaker: Professor Patrick Perry, Stern School of Business, New York University.
Abstract: Early in the development of recommender systems, hierarchical models were recognized as a tool capable of combining content-based filtering (recommending based on item-specific attributes) with collaborative filtering (recommending based on preferences of similar users). However, as recently as the late 2000s, many authors deemed the computational costs required to fit hierarchical models to be prohibitively high for commercial-scale settings. This talk addresses the challenge of fitting a hierarchical model at commercial scale by proposing a moment-based procedure for estimating the parameters of a hierarchical model. This procedure has its roots in a method originally introduced by Cochran in 1937. The method trades statistical efficiency for computational efficiency. It gives consistent parameter estimates, competitive prediction error performance, and substantial computational improvements. When applied to a large-scale recommender system application and compared to a standard maximum likelihood procedure, the method delivers competitive prediction performance while reducing computation time from hours to minutes. 


Detecting the Structure and Dynamics of Political Concepts from Text

Date: 8 December 2016
Speaker: Dr Paul Nulty, Research Associate, Cambridge Language Sciences at the University of Cambridge
Abstract: The availability of large archives of digitised political text offers new opportunities for analysing the emergence and formation of political concepts. This talk describes new methods for discovering the structure of abstract political concepts from large text corpora. Working in a theoretical framework that treats concepts as cultural entities that can be studied through patterns of lexical behaviour (De Bolla, 2013), Dr Nulty outlined several methods from computational linguistics that enable researchers to discover the architecture of political concepts from text. At the level of the sentence, grammatical relation parsing (dependency parsing) is used to extract predicates and propositions that compose complex concepts. Beyond the sentence-level, Paul described a weighted mutual-information measure calculated from long-range co-occurrences to discover looser conceptual associations that might not occur in a predicating relation with the central concept. Finally, he presented several examples from historical corpora of traces of the origin and structure of political concepts, and how these have changed over time.


Measuring and explaining political sophistication through textual complexity

Date: 24 November 2016
Speaker: Professor Ken Benoit, Department of Methodology at the LSE (with Kevin Munger and Arthur Spirling)
Abstract: The sophistication of political communication has been measured using ``readability'' scores developed from other contexts, but their application to political text suffers from a number of theoretical and practical issues. We develop a new benchmark of textual complexity which is better suited to the task of determining political sophistication. We use the crowd to perform tens of thousands of pairwise comparisons of snippets of State of the Union Addresses, scale these results into an underlying measure of reading ease, and ``learn'' which features of the texts are most associated with higher levels of sophistication, including linguistic markers, parts of speech, and a baseline of word frequency relative to 210 years of the Google book corpus ngram dataset. Our refitting of the readability model not only shows which features are appropriate to the political domain and how, but also provides a measure easily applied and rescaled to political texts in a way that facilitates comparison with reference to a meaningful baseline.