SEDS seminar

Past SEDS seminars

Past SEDS Seminars

A Measure of Survey Mode Differences
Date: 
Tuesday 5 December 2017
Speaker: 
Jeff Gill, Department of Government, American University, Washington DC
Abstract: 
Jeff will evaluate the effects of different survey modes on respondents' patterns of answers using an entropy measure of variability. While \emph{measures of centrality} show little differences between face-to-face and Internet surveys, he will find strong patterns of \emph{distributional differences} between these modes where Internet responses tend towards more diffuse positions due to lack of personal contact during the process and the social forces provided by that format. The results provide clear evidence that mode matters in modern survey research, and he will make recommendations for interpreting results from different modes.

Controlling Bias in Artificial Intelligence with Nisheeth Vishoi and Elisa Celis
Date: 
Wednesday 29 November 2017
Time: 
3:30 - 5pm 
Speakers: 
Nisheeth Vishoi and Elisa Celis
Abstract: 
Bias is an increasingly observed phenomenon in the world of artificial intelligence (AI) and machine learning: From gender bias in online search to racial bias in court bail pleas to biases in worldviews depicted in personalized newsfeeds. How are societal biases creeping into the seemingly “objective’’ world of computers and programs? At the core, what is powering today’s AI are algorithms for fundamental computational problems such as classification, data summarization, and online learning. Such algorithms have traditionally been designed with the goal of maximizing some notion of “utility” and identifying or controlling bias in their output has not been a consideration. In this talk, Nisheeth and Elisa will  explain the emergence of bias in algorithmic decision making and present the first steps towards developing a systematic framework to control biases in several of the aforementioned problems. This leads to new algorithms that have the ability to control and alleviate bias, often without a significant compromise to the utility that the current algorithms obtain.

Econometrics for Learning Agents with Vasilis Syrgkanis
Date: 23 November 2017
Speaker: Vasilis Syrgkanis, Microsoft
The traditional econometrics approach for inferring properties of strategic interactions that are not fully observable in the data, heavily relies on the assumption that the observed strategic behavior has settled at an equilibrium. This assumption is not robust in complex economic environments such as online markets where players are typically unaware of all the parameters of the game in which they are participating, but rather only learn their utility after taking an action. Behavioral models from online learning theory have recently emerged as an attractive alternative to the equilibrium assumption and have been extensively analyzed from a theoretical standpoint in the algorithmic game theory literature over the past decade. In this talk, Vasilis will present recent work, in which he takes a learning agent approach to econometrics, i.e. infer properties of the game, such as private valuations or efficiency of allocation, by only assuming that the observed repeated behavior is the outcome of a no-regret learning algorithm, rather than a static equilibrium. He will also present some empirical results from applying our methods to datasets from Microsoft’s sponsored search auction system.

Joint work with Denis Nekipelov, Eva Tardos and Yichen Wang.

On Elicitation Complexity
Date: 
26th October 2017
Speaker: Ian Kash, Microsoft Research
Abstract: Elicitation is the study of statistics or properties which are computable via empirical risk minimization. This has applications in understanding which loss function to use in a regression for a particular statistic or finding a surrogate loss function which is easier to optimize.

While several recent papers have approached the general question of which properties are elicitable, we suggest that this is the wrong question---all properties are elicitable by first eliciting the entire distribution or data set, and thus the important question is how elicitable. Specifically, what is the minimum number of regression parameters needed to compute the property?

Building on previous work, we introduce a new notion of elicitation complexity and lay the foundations for a calculus of elicitation. We establish several general results and techniques for proving upper and lower bounds on elicitation complexity. These results provide tight bounds for eliciting the Bayes risk of any loss, a large class of properties which includes spectral risk measures and several new properties of interest.

http://www.cs.colorado.edu/~raf/media/papers/elic-complex.pdf

Joint work with Rafael Frongillo.

Normalizing Digital Trace Data

Date: 19 October 2017
Speaker: Andreas Jungherr
Abstract: Over the last ten years, social scientists have found themselves confronting a massive increase in available data sources. In the debates on how to use these new data, the research potential of “digital trace data” has featured prominently. While various commentators expect digital trace data to create a “measurement revolution”, empirical work has fallen somewhat short of these grand expectations. In fact, empirical research based on digital trace data is largely limited by the prevalence of two central fallacies: First, the n=all fallacy; second, the mirror fallacy. As I will argue, these fallacies can be addressed by developing a measurement theory for the use of digital trace data. For this, researchers will have to test the consequences of variations in research designs, account for sample problems arising from digital trace data, and explicitly link signals identified in digital trace data to sophisticated conceptualizations of social phenomena. Below, I will outline the two fallacies in greater detail. Then, I will discuss their consequences with regard to three general areas in the work with digital trace data in the social sciences: digital ethnography, proxies, and hybrids. In these sections, I will present selected prominent studies predominantly from political communication research. I will close by a short assessment of the road ahead and how these fallacies might be constructively addressed by the systematic development of a measurement theory for the work with digital trace data in the social sciences.


 

Integrating Conflict Event Data

Date:
 
Thursday 4th May 2017
Speaker: 
Karsten Donnay, University of Konstanz
Abstract: 
The growing volume of sophisticated event-level data collection, with improving geographic and temporal coverage, offers prospects for conducting novel analyses. In instances where multiple related datasets are available, researchers tend to rely on one at a time, ignoring the potential value of the multiple datasets in providing more comprehensive, precise, and valid measurement of empirical phenomena. If multiple datasets are used, integration is typically limited to manual efforts for select cases. We develop the conceptual and methodological foundations for automated, transparent and reproducible integration and disambiguation of multiple event datasets. We formally present the methodology, validate it with synthetic test data, and demonstrate its application using conflict event data for Africa, drawing on four leading sources (UCDP-GED, ACLED, SCAD, GTD). We show that whether analyses rely on one or multiple datasets can affect substantive findings with regard to key explanatory variables, thus highlighting the critical importance of systematic data integration.


 

The STEM requirements of "non-STEM" jobs: evidence from UK online vacancy postings and implications for Skills & Knowledge Shortages

Date: Thursday 23rd March 2017
Speaker: Inna Grinis, PhD candidate in the Department of Economics, LSE
Abstract: Do employers in “non-STEM” occupations (e.g. Graphic Designers, Economists) seek to hire STEM (Science, Technology, Engineering, and Mathematics) graduates with a higher probability than non-STEM ones for knowledge and skills that they have acquired through their STEM education (e.g. “Microsoft C#”, “Systems Engineering”) and not simply for their problem solving and analytical abilities? This is an important question in the UK where less than half of STEM graduates work in STEM occupations and where this apparent leakage from the “STEM pipeline” is often considered as a wastage of resources. To address it, this paper goes beyond the discrete divide of occupations into STEM vs. non-STEM and measures STEM requirements at the level of jobs by examining the universe of UK online vacancy postings between 2012 and 2016. We design and evaluate machine learning algorithms that classify thousands of keywords collected from job adverts and millions of vacancies into STEM and non-STEM. 35% of all STEM jobs belong to non-STEM occupations and 15% of all postings in non-STEM occupations are STEM. Moreover, STEM jobs are associated with higher wages within both STEM and non-STEM occupations, even after controlling for detailed occupations, education, experience requirements, employers, etc. Although our results indicate that the STEM pipeline breakdown may be less problematic than typically thought, we also find that many of the STEM requirements of “non-STEM” jobs could be acquired with STEM training that is less advanced than a full time STEM education. Hence, a more efficient way of satisfying the STEM demand in non-STEM occupations could be to teach more STEM in non-STEM disciplines. We develop a simple abstract framework to show how this education policy could help reduce STEM shortages in both STEM and non-STEM occupations.

Read full paper  


 

The Case for Research Preregistration, with Applications in Elections Research

Date: 
 Thursday 23rd February 2017
Speaker:
 Prof. Jamie Monogan, Department of Political Science, University of Georgia.
Abstract: 
Preregistration refers to when an analyst commits to a research design before observing the outcome. How can preregistration be useful for political scientists? This presentation makes the
argument that, when appropriate, study registration increases honesty and transparency in research reporting in a way that benefits authors, reviewers, and readers. The essential element for preregistration to be useful is a clear public signal of the design before the data could possibly be observed, such as
before an experiment is conducted or before an election occurs. This presentation therefore offers illustrations of how to implement preregistration that focus on American elections. The three examples include: An analysis of the immigration issue in 2010 U.S. House of Representatives races, the effect of the 2011 debt ceiling controversy on the 2012 U.S. House elections, and a yet-to-be implemented design of how anxiety shaped individual voters' decision-making process in the 2016 U.S. presidential election.


 

Revealing the Anatomy of Vote Trading

Date: Thursday 9th February 2017
Speaker: Dr Omar Guerrero, Said Business School, University of Oxford.
Abstract: Cooperation in the form of vote trading, also known as logrolling, is central for law-making processes, shaping the development of democratic societies. Measuring vote trading is challenging because it happens behind closed doors. Hence, it is not directly observable. Empirical evidence of logrolling is scarce and limited to highly specific situations because existing methods are not easily applicable to broader contexts. We have developed a general and scalable methodology for revealing a network of vote traders, allowing us to measure logrolling on a large scale. Analysis on more than 9 million votes spanning 40 years in the U.S. Congress reveals a higher logrolling prevalence in the Senate and an overall decreasing trend over recent congresses, coincidental with high levels of political polarization. Our method is applicable in multiple contexts, shedding light on many aspects of logrolling and opening new doors in the study of hidden cooperation.


 

Fitting Hierarchical Models in Large-Scale Recommender Systems

Date:
Thursday 26 January 2017
Speaker: Professor Patrick Perry, Stern School of Business, New York University.
Abstract: Early in the development of recommender systems, hierarchical models were recognized as a tool capable of combining content-based filtering (recommending based on item-specific attributes) with collaborative filtering (recommending based on preferences of similar users). However, as recently as the late 2000s, many authors deemed the computational costs required to fit hierarchical models to be prohibitively high for commercial-scale settings. This talk addresses the challenge of fitting a hierarchical model at commercial scale by proposing a moment-based procedure for estimating the parameters of a hierarchical model. This procedure has its roots in a method originally introduced by Cochran in 1937. The method trades statistical efficiency for computational efficiency. It gives consistent parameter estimates, competitive prediction error performance, and substantial computational improvements. When applied to a large-scale recommender system application and compared to a standard maximum likelihood procedure, the method delivers competitive prediction performance while reducing computation time from hours to minutes. 


 

Detecting the Structure and Dynamics of Political Concepts from Text

Date: 8 December 2016
Speaker: Dr Paul Nulty, Research Associate, Cambridge Language Sciences at the University of Cambridge
Abstract: The availability of large archives of digitised political text offers new opportunities for analysing the emergence and formation of political concepts. This talk describes new methods for discovering the structure of abstract political concepts from large text corpora. Working in a theoretical framework that treats concepts as cultural entities that can be studied through patterns of lexical behaviour (De Bolla, 2013), Dr Nulty outlined several methods from computational linguistics that enable researchers to discover the architecture of political concepts from text. At the level of the sentence, grammatical relation parsing (dependency parsing) is used to extract predicates and propositions that compose complex concepts. Beyond the sentence-level, Paul described a weighted mutual-information measure calculated from long-range co-occurrences to discover looser conceptual associations that might not occur in a predicating relation with the central concept. Finally, he presented several examples from historical corpora of traces of the origin and structure of political concepts, and how these have changed over time.


 

Measuring and explaining political sophistication through textual complexity

Date: 24 November 2016
Speaker: Professor Ken Benoit, Department of Methodology at the LSE (with Kevin Munger and Arthur Spirling)
Abstract: The sophistication of political communication has been measured using ``readability'' scores developed from other contexts, but their application to political text suffers from a number of theoretical and practical issues. We develop a new benchmark of textual complexity which is better suited to the task of determining political sophistication. We use the crowd to perform tens of thousands of pairwise comparisons of snippets of State of the Union Addresses, scale these results into an underlying measure of reading ease, and ``learn'' which features of the texts are most associated with higher levels of sophistication, including linguistic markers, parts of speech, and a baseline of word frequency relative to 210 years of the Google book corpus ngram dataset. Our refitting of the readability model not only shows which features are appropriate to the political domain and how, but also provides a measure easily applied and rescaled to political texts in a way that facilitates comparison with reference to a meaningful baseline.