Innovative data and methods abstracts

Strand organisers: Francesco C. Billari & Ridhi Kashyap, University of Oxford 

Innovative Data and Models: Tuesday, 8th September -  9.00am 

Twitter as a novel source of mobility indicators
Nigel Swier, Office for National Statistics 

Twitter is a global micro-blogging platform. Users “tweeting” from a GPS-enabled smart device can opt to give their precise location. Although fewer than 2 per cent of tweets are geo-located, around 500,000 are sent each day within the British Isles. This paper explores whether these data could provide fresh insights into patterns of mobility and migration. The focus is on the development of methods for processing these data so that they can be mapped to common population concepts, in particular, usual residence. A spatial clustering algorithm (DBSCAN) is used to define frequently visited locations, or ‘anchor points’. Information is then derived about the location of these clusters from other address data, for example, whether it is at a residential location. The dominant residential anchor point is then assumed to be the most likely location of usual residence. Changes in these dominant clusters for individuals are assumed to be a change of residence. These moves are then aggregated to indicate flows between areas. Analysis of net flows by local authority and month shows a striking pattern for student areas that follows the cycle of the academic year. This is not something that can be observed using existing sources. These indicators could be used for quality assuring the mid-year population estimates (MYEs). It might also be feasible to produce monthly population estimates by inferring socio-demographic variables from the Twitter data and calibrating to the MYEs 


Statistical analysis of an agent-based model of inter-generational fertility patterns using Gaussian Process Emulators
Jason Hilton, Jakub Bijak, University of Southampton   

Research questions: 1. What behavioural mechanisms are consistent with a relationship between cohort size and fertility outcomes, and why might such a relationship fail? 2. What role might relative well-being between generations and between peers play in any creating or destroying such a relationship? Methods: An agent-based model was constructed linking individual fertility decisions to labour market outcomes and level of well-being relative to both parents and peers (Easterlin 1966). Agents wish to postpone childbearing until they reach an acceptable level of well-being, defined either with respect to that present during their upbringing, or versus a network of peers. The relationship between fertility outcomes and the parameters of the agent-based model are analysed using statistical emulation techniques. These link simulation inputs to outputs by the use of uncertain functions, allow quantification of all sources of uncertainty, and facilitate the calibration of the model to empirical data. Data Sources: The American Community survey and US Census data are used to parameterise and evaluate the model. Applications: The combination of the use of uncertainty quantification methods with agent-based models has the potential to allow coherent approaches to policy making in situations where the relationship between micro-level behaviour and macro-level outcomes is complex. Results: Preliminary results show that the model is able to replicate the expected periodic fertility cycles (derived mathematically by, for example, Ronald Lee (1974)) through the micro-level mechanisms described above.


The risky business of asking for help: an agent-based model of unmet need Jonathan Gray, Jakub Bijak, Seth Bullock, University of Southampton   

In this work we present an agent-based model of elderly care where populations of decision theoretic agents play a game, reflecting the interwoven supply and demand side decision-making processes that govern whether older adults seek, and receive support in their activities of daily living. The model draws together longitudinal survey (ELSA) data to provide base rates of need for support, care costs from local authority activity reports (HSCIC PSSE/PSSA), and attitude surveys (ONS OPN, EuroBarometer, and ESS) to produce distributions of synthetic agent psychologies. We then calibrate the model against reported rates of unmet need from the ELSA dataset, by building statistical emulators of the simulation model to rapidly explore the free parameter space. The simulation results suggest that the care system is most sensitive to the balance between the perceived costs of failing to provide care where needed, and the rewards of delivering appropriate support. Further to this, the model indicates that the real system lies near to collapse, with relatively small decreases in perceived costs and rewards leading to breakdown. Potential applications for the simulation itself are in the arena of policy development, by suggesting possible implications for interventions, for example the impact of increases in the cost of care provision, or of campaigns targeting the perception of stigma attached to age. In addition, the parameterisation and calibration of the model demonstrate the possibilities of simulation as a method for integrating disparate data sources. 


The impact of immigration on long-term and transient population dynamics: case study of Germany
Anna Zincenko1, Jakub Bijak2, Thomas Ezard1, 1Biological Sciences, University of Southampton, 2 Department of Social Statistics and Demography, University of Southampton 

We propose the simple matrix female population model that includes emigration and immigration. In this model we aggregate emigration with mortality and consider immigration as a given external inflow. The main assumption is that the children of migrants born in Germany belong to the native population. This means that their demographic parameters coincide with those of native population. The data was obtained from Destatis, data reconstruction was performed to disaggregate the data to the required level of detail, especially with respect to age groups. We have shown that constant growth of migrant inflow generates large transient fluctuations in the German born and migrant born population. In particular, for population of Germany the critical rate of growth lambda of immigration flow is calculated at which the proportion of migrants in Germany reaches 50% after some time period. This critical rate does not exceed 1.03 in other words does not exceed the growth of 3% per year, and its value is not sensitive to some reasonable changes to the age structure of migrant inflow. On the other hand, the time for which the fraction of migrants reaches 50%, for greater than the critical value, is very sensitive to the value of lambda. In methodological terms, the proposed model allowed calculating transient fluctuation of age structure of the population, and estimating the time of stabilisation in the presence of migration. Further work includes relaxing the assumptions on the constancy of vital rates and migration streams, and carrying out a more in-depth sensitivity analysis. 


Innovative data & models for sub-national level population research: Tuesday, 8th September - 5.00pm   

Neighborhood trajectories and the ethnic population composition: moving beyond standard administrative boundaries
Merle Zwiers1, David Manley2, Maarten van Ham1, 1Faculty of Architecture and the Built Environment, Delft University of Technology, 2School of Geographical Sciences, University of Bristol 

Over the past few decades, a substantial body of research has focused on the issue of segregation of ethnic groups. Research has shown that segregation generally declines over time as migrant groups gradually become more assimilated in the host society. Despite the bulk of empirical evidence for this finding, researchers have shown that there continues to be unexplained variation in these patterns between different groups. Most studies in this field use different indices to measure segregation, however, a major shortcoming is that these indices are sensitive to geographical definitions and the underlying data construction. A decline in segregation levels in a particular area thus provides hardly any insight in the different spatial patterns of ethnic groups. Segregation of one ethnic group might decline in one area or increase in another area, or occur on a different spatial scale. Related to this is the fact that many segregation studies tend to treat neighborhoods as static entities and neglect the dynamic character of neighborhoods in explaining changes in levels or patterns of segregation. As a result, the relation between processes of segregation, neighborhood change and the spatial patterns of different ethnic groups remains unclear. This study focuses on trajectories of neighborhood change in relation to the concentration of different ethnic groups. A latent class growth model was estimated using Dutch register data on the 500 by 500 meter grid level over the period 1999 to 2013. Our findings generally show similar trajectories for different ethnic groups, albeit with a different spatial outcome. Ethnic minorities continue to be concentrated in particular neighborhoods in large cities. 


Selective licensing of landlords and the identification of the private rented sector Les Mayhew1, Gillian Harper2, 1Cass Business School, City University London, 2Mayhew Harper Associates Ltd.   

There is an awareness that the housing market in the UK has changed significantly in the last decade or so. A key trend has been a decline in social housing combined with a lack of affordable homes, which has resulted in a massive growth in the private rented sector. It is claimed that many local authorities (LAs) have seen negative externalities as a result including increased antisocial behaviour or ASB (e.g. noise, dilapidations, criminal damage. untidy gardens and rubbish). One option open to LAs is ‘Selective Licensing’ in which private landlords must apply for a licence to operate which can be revoked as necessary. A condition for doing so is that an LA must demonstrate a direct link between the private rented sector and ASB. However, this is not straightforward since there is no information on whether a property is privately let or not. This paper describes a multivariate model using local administrative data sets and logistic regression to identify which properties are privately rented. ASB data are used to show whether the private sector is partly responsible for this nuisance with the caveat that any evidence produced as a result is able to stand up to possible legal challenge in a court of law. Results are illustrated using GIS and statistical techniques to establish both property tenure and the nuisance radius of every ASB event including its persistence and intensity. Illustrative results will draw on case studies in ten LAs where the techniques have been applied, including both successes and pitfalls. 


Projecting the regional explicit development of the population structure and social heterogeneity in Nepal
K. C. Samir, Markus Speringer, Wittgenstein Center for Demography and Global Human Capital, Vienna, Austria (IIASA, VID/ÖAW, WU) 

Population projections at national level and smaller administrative units can provide essential information for planning and implementing government policies, including the allocation of budget and resources. Often, such projections use crude methods and are not based on evidence and argument-based assumptions about the future. Consequently, the results, when compared later with actual demographic rates, are further away from reality. Acknowledging the fact that the future is uncertain, we attempted to minimize the level of uncertainty, firstly, by understanding the important forces (behavioral, cultural and socioeconomic factors) that affect future demographic events. Secondly, we constructed a baseline demographic scenario based not only on the continuation of past trend but also on how the other forces develop and will impact the demographic events in the future. In addition, in collaboration with the Ministry of Health and Population, we developed multiple policy (population and health) relevant projection scenarios for more than 4000 administrative units in Nepal for the period 2011 to 2031. We used data, published tables and sampled microdata, from the latest two Census 2001 and 2011 along with several rounds of Demographic and Health survey data. In this paper, we will present our methodological approach of estimating and projecting the fertility, mortality and migration at small spatial level as well as the population projection model. Preliminary results show that, in Nepal, both internal and international migration are and will be the main (direct and indirect) cause of changing population dynamics at different spatial levels. 


Estimating life-tables for very small areas in a national context: an analysis of Israeli Statistical Areas
Jon Anson, Department of Social Work, Ben-Gurion University of the Negev 

Single indicators of the level of mortality, such as life expectancy at birth, or the standardised mortality ratio, are extremely useful for providing an overall indication of the mortality differences between populations. Such indicators have proven extremely useful in showing the importance of social differences on aggregate population health. However, as mortality reaches minimal levels, and the variations between populations become relatively small, it becomes important to distinguish young from middle- and old- age mortality, in order to see which particular social conditions are important in their effects on mortality, and at which ages. Obtaining such detail requires the construction of life tables and the identification of the risks of mortality at each age, but such fine detail can be elusive when examining small populations for which the number of people in each age group is small, the number of deaths minuscule, and estimation errors are large. A possible solution to this problem is to estimate age-specific mortality rates simultaneously for all the subunits of a particular country, using the reported number of deaths, by age and sex, for each unit as the input data. The national mortality rates then serve as a model from which local deviations are estimated using a multi-level model with sex and age-specific cells as units nested within age groups and local areas at the second level. Age-sex specific rates are then estimated as a function of local-level social conditions, both directly and in interaction with age. This method has the advantage of estimating considerably fewer parameters than would be necessary if each age-sex mortality rate was estimated independently for each regional unit. From the resulting mortality rates we can then estimate local level life tables by sex, and derive summary data such as life expectancy at birth as well as the probability of surviving to age 35 (l35) as an estimate of mortality at younger ages, and the modal age at death as an estimate of mortality prematurity or delay in later life. We use population and social data for the 1280 statistical areas (enumeration districts) from the 1995 Israeli census, together with numbers of deaths over the five years 1993 to 1997. The social data for each unit include estimates of the average standard of living, a measure of traditionality of the family structure and population group (Jewish or Palestinian-Arab).