Innovative data strand abstracts


Innovative data, methods & models. Tuesday 13 September 11:00am 

Equalities in a Stationary Population: Observations, a Conjecture, and Applications
David A. Swanson 1, Lucky M. Tedrow 2, 1 University of California Riverside 2 Western Washington University   

Although many of them are apparent and some of those that are not so apparent have been described, equalities represent a defining characteristic of a stationary population. In addition to the obvious equalities such as the crude birth rate and the crude death rate, research has revealed that mean years lived is equal to mean years remaining and also that the distribution of age composition is equal to the distribution of remaining lifetimes. To these equalities, the following can be added: (1) mean age is equal to mean years lived (and therefore equal to mean years remaining); (2) mean age can be expressed as a function of Total Years lived by the stationary population and its life expectancy at birth, which implies that its mean age can be expressed a as a function of its crude birth rate as well as its crude death rate. Because mean age is equivalent to mean years lived and mean years remaining, these also can be expressed as a function of Total Years Lived and, respectively, life expectancy at birth, the crude birth rate and the crude death rate. We conjecture that the sum of mean years lived and mean years remaining exceeds life expectancy at birth in a given stationary population. Along with other applications, we note that if true, the conjecture has actuarial implications. 

A Sensitivity Analysis of Transient Population Dynamics: The Importance of Transients and their Key Drivers
Claire Dooley 1, Jakub Bijak 2, Stuart Townley 3, David Hodgson 4, Thomas H. G. Ezard 1, 1 Centre for Biological Sciences, University of Southampton, 2 Social Statistics & Demography, University of Southampton, 3 Environment & Sustainability Institute, University of Exeter, 4. Centre for Ecology and Conservation, University of Exeter    

Fluctuations in a population’s size and structure occur constantly in a changing environment. In Europe and beyond, the Second World War left millions dead but those who survived gave birth to a baby-booming generation. As baby-boomers (or busters) grow older and progress through their population’s age structure, short-term (transient) population dynamics prevail. Transient dynamics are particularly important because they can result in the population’s composition and/or size differing from the expected long-term (asymptotic) behaviour. It is therefore crucial to understand and predict the positive and negative implications transients have on future population size and composition in order to make predictions about future population health, social security and economic growth. Here we examine the contribution of transient growth to overall country-level population growth across Europe between 2002 and 2008. We use spatial matrix population models and sensitivity (and elasticity) analyses to quantify the relative impact perturbations in age-specific fertility, mortality and migration have on country-level transient growth. We utilize data from EuroStat and the Integrated Modelling of European Migration (IMEM) Databases. Our results show that the relative contribution of transient growth to overall population growth varies from 0.36 (Latvia, 2013) to 1 (Ireland, 2008), illustrating the importance of transient growth, in addition to long-term (asymptotic) growth, for European countries. We find that the sensitivity of transients to changes in different demographic factors varies greatly across countries. Here, we present a full assessment of the demographic drivers responsible for the inter-country variation in transient dynamics. 

Projections of the Influence of Migrant Ages on Living Standards
Nick Parr 1, Ross Guest 2, 1 Macquarie University, 2 Griffith University 

This paper assesses the implications of alternative age profiles of newly-arrived international migrants for the age profiles and size of the population and for living standards, focusing on Australia. In doing so it aims to provide a better appreciation of the interaction of the migrant age profile with patterns of fertility, mortality, and labour force participation in determining the aggregate age distributions of the population and national workforce and the productivity of the national workforce. Method: Projections of living standards using the same fertility, mortality, labour force participation and productivity assumptions but differing in the age composition of the migrant intake are prepared. Following Parr and Guest (2014), summary measures of the values of the projections are compared. The sensitivity of results to variations in fertility, mortality and labour force participation assumptions is assessed. Results: Preliminary results show that, under the continuation of current fertility, mortality and life expectancy, 20-24 is the optimum age for migrants; differences in the value of migrants across the younger working ages are slight. However, the value of migrants declines considerably with increasing age above age 45. Potential Applications: Potentially the analysis could strengthen the evidence base for targeting immigrant selection policies. 

Papyro, Silico, Vitro
Jonathan Gray 1, Jakub Bijak 1, Seth Bullock 2, 1 University of Southampton, 2 University of Bristol 

Papyro, Silico, Vitro. In the last few years, in silico approaches, and particularly agent-based modelling have seen considerable interest in demography. These are in part attractive because they offer an opportunity for experiment not otherwise available at the population level. This raises the issue of validation of simulations. A crucial component of this is validation at the population level, but we must not neglect the need to validate the individual. In this work we report results of lab experiments designed to identify which of four decision mechanisms used by agents in an agent-based model of older adult care is most plausible. The process of experimental design employs the simulation model to maximise the information gain from the experiments, by identifying parameters where a single decision model predicts behaviour different to the rest. The resulting parameters then form the conditions for a set of experimental treatments. This allows us to use Bayesian Model Selection to quantify the support for each of the four decision rules. This approach hints at the promise for simulation as an experimental tool in population science, and represents a small step towards closing the loop between simulation and data, by guiding further empirical research. 

Probabilistic population estimation & forecasting. Tuesday 13 September 13:30pm 

Integrated Probabilistic Population Forecasts for the United Kingdom: A Bayesian Approach
Jason Hilton, Jakub Bijak, Erengul Dodd, Jonathan Forster, Peter Smith, Economic and Social Research Council Centre for Population Change, University of Southampton 

We present a fully integrated and dynamic Bayesian approach to forecast populations by age and sex. This probabilistic approach combines models for age- and sex-specific fertility, mortality, immigration and emigration within a cohort-component projection framework, and provides coherent population forecasts with associated measures of uncertainty. The methodology may be adapted to handle different data types and sources of information. We analyse historic data for the United Kingdom and forecast the components of population change, and hence the overall future population size and structure. We also compare the results obtained from different forecast models for age-specific fertility, mortality, and migration. In doing so, we demonstrate the flexibility and advantages of adopting the Bayesian approach for population forecasting and highlight areas where this work could be extended. 

Bayesian multiregional population forecasting: England
Arkadiusz Wiśniowski 1, James Raymer 2, 1 University of Manchester, 2 Australian National University 

In this paper, we extend the well-known multiregional population projection model developed by Andrei Rogers and colleagues to be fully probabilistic. Multiregional models provide a general and flexible platform for modelling and analysing population change over time. They allow the combination of all the main components of population change by age with various transitions that population groups may experience throughout their life course. What distinguishes these models from ordinary projections is that they include transition matrices of interregional migration by age. This information is an important component of subnational population change yet models for forecasting the patterns for use in population projections are largely non-existent. National statistical offices tend to rely on simple deterministic assumptions regarding net migration or gross flows of in-migration and out-migration. These models do not take into account the linkages between origins and destinations and often have to be adjusted to ensure zero net migration and the same totals for in-migration and out-migration. We focus on the full matrix of flows to avoid this problem. To deal with the large number of possible flows and provide measures of uncertainty, we develop a Bayesian hierarchical model to forecast age-specific interregional migration, and then include this information with probabilistic forecasts of regional births, deaths, immigration and emigration. The results demonstrate the differences that arise from different specifications and the promise of the general approach. 

Modelling the impact of lifestyle factors and chronic diseases on disability and dependency through microsimulation
Carol Jagger 1, 2, Andrew Kingston 1, 2, Heather Booth 3, 1 Institute of Health and Society, Faculty of Medicine, Newcastle University, 2 Newcastle University Institute for Ageing, 3 College of Arts and Social Sciences, Australian National University, Canberra, Australia, on behalf of the MODEM project team (    

Projections of the numbers of older people with disability or dependency often fail to account for changes in risk factors and treatments for the chronic disabling diseases. We aim to address this through a new microsimulation model MicSIMPOP, that forms part of a larger project MODEM (, and aims to model the health and associated care needs of the English population to 2041 and the impact of interventions for risk factor reduction and disease prevention and treatment. MicSIMPOP is modelled on a previous Australian microsimulation model DYNOPTASim and uses a discrete time approach. The baseline data (and monthly transition probabilities) uses three longitudinal studies: Understanding Society waves 1-2 (ages 35 and over); the English Longitudinal Study of Ageing waves 5-6 (ages 50 and over); and the Cognitive Function and Ageing Study II (ages 65 and over). These data were re-weighted to the England population in 2014, cloned to give a weight of one for each individual and then a 1% random sample was taken. Baseline characteristics generated on individuals are: sociodemographic (age, gender, living arrangements, marital status, education); lifestyle behaviours (smoking, alcohol, physical activity, BMI); diseases (cognitive impairment, CHD, stroke, hypertension, diabetes, respiratory disease, arthritis, cancer, hearing and vision impairment); disability/dependency measured by a time-based measure. Survival probabilities were generated from the qx schedules underlying the 2014-based population projections for England. The paper will describe the creation and validation of MicSIMPOP as well as the first results of projections of specific diseases, disability and multimorbidity over the next 25 years. 

Administrative data research. Wednesday 14 September 11:30am

Session convenor: Emma White, University of Southampton 

What can linked administrative data tell us about the population at small area level?Alistair Dent, Neil Park, Office of National Statistics   

The Office for National Statistics’ Census Transformation Programme is responsible for taking forward three high-level deliverables: a predominantly online census of all 26 million households and communal establishments in England and Wales; development of alternative administrative data census estimates, compared to the 2021 Census; improved and expanded population statistics through increased use of administrative data and surveys. In October 2015, the Programme published its first set of Administrative Data Research Outputs, demonstrating progress on the second deliverable. Linked administrative data were used to produce population estimates at local authority by five-year age sex groups for 2011, 2013 and 2014. The accompanying report contained four case studies outlining key quality issues with the new method, and offered users the opportunity to provide feedback. A well-received new data visualisation tool also provided users with an interactive way of understanding the quality of the research outputs. The next release in autumn 2016 is expected to include: an extension of the previous years’ time series to include LA estimates by single year of age and estimates for small area level (to Lower Layer Super Output Area); a new time series showing improvements to the methodology; estimates of the number of households at LA level, research on income using combined PAYE and benefits data (from DWP and HMRC). This presentation will focus on the population estimates at small area level, including comparisons with official population estimates. The research on households and income are covered in a separate session. 

Assessing uncertainty when combining administrative data to estimate population counts
Dilek Yildiz 1, 2, Peter W.F. Smith 3, 1 Wittgenstein Centre for Demography and Global Human Capital (IIASA, VID/ÖAW, WU), 2 Vienna Institute of Demography/Austrian Academy of Sciences, 3 University of Southampton 

The aim of this research is to develop a methodology to produce measures of uncertainty when combining administrative data to estimate population counts in the absence of a traditional census. It is well known that, the population estimates from administrative registers are subject to bias since they are not designed to collect information from the whole population. Previous research showed that combining aggregate level administrative data using log-linear models with offsets in a classical framework decreases the bias. However, they did not provide estimates of uncertainty for the final population estimates. This research extends the previous work by combining data sources in a probabilistic framework, produce population counts by age, sex and location, and estimate associated measures of precision. To illustrate the proposed methodology, the aggregate level Patient Register is combined with auxiliary information using log-linear models with offsets for the South East region of England within two probabilistic frameworks. First, the Census Models, which are the probabilistic equivalents of previously assessed models, provide the measures of uncertainty around the population count estimates. Second, the Sample Models further improve on the Census Models by combining the administrative data source with auxiliary information obtained from 10% and 5% representative samples from the population. The calibration of the models is assessed by comparing the resulting estimates to the ‘gold standard’ population estimates. We also present the true credible intervals around the population count estimates, as well as the posterior distributions of the estimated counts and the parameters. 

Zone design for statistical disclosure control in administrative and linked microdata
James Robards 1, David Martin 1, 2, Chris Gale 2, 1 ESRC National Centre for Research Methods, University of Southampton, 2 ESRC Administrative Data Research Centre for England, University of Southampton 

The increase in spatially-referenced administrative and linked datasets presents growing challenges for statistical disclosure control. Such new forms of data typically contain both attribute detail and a large data volume, therefore increasing the risk of disclosure of information about individuals and enabling identification. Detailed spatial information may be important to the researcher but also increases risk. This paper is concerned with application of automated zone design tools to protect record-level datasets in a way that might be implemented by a data provider. Implementation could facilitate release of richer data to researchers preserving small area geographical associations, while not revealing actual locations. Using a synthetic microdataset of individual records with locality-level (MSOA) geography codes for England and Wales (variables: age, gender, economic activity, marital status, occupation, number of hours worked and general health), we synthesize address-level locations with reference to 2011 Census headcount data. These synthetic locations are then associated with a range of spatial measures and indicators (e.g. distance to GP). Implementation of the AZTool zone design software enables a bespoke, non-disclosive zone design solution, providing area codes that can be added to the research data without revealing actual locations to the researcher. Results will explain the spatial characteristics of the new synthetic dataset (which may have broader utility) and show changing risk of disclosure and utility when coding to spatial units from different scales and aggregations. Using the synthetic dataset will demonstrate the utility of the approach for a variety of linked and administrative data without any disclosure risk. 

Determinants of enrolment into and dropout from Higher Education: evidence from a cohort of 15,000 people created using the Administrative Data Research Centre – Northern Ireland
David M. Wright 1, 2, Dermot O'Reilly 1, 2, 1 Administrative Data Research Centre – Northern Ireland, 2 Queen’s University Belfast    

Understanding the factors that influence representation in Higher Education (HE) is of considerable public interest. Previous studies have concentrated on enrolment or dropout. We considered both, identifying individual, household and area factors that may influence representation. Two large administrative datasets; the 2011 Northern Ireland Census and longitudinal student enrolment data held by the Higher Education Statistics Agency (HESA) were linked, forming a cohort of 14,895 Northern Ireland-domiciled young people turning 18 during the 2010/11 academic year. We compared the characteristics of those enrolling/not enrolling for undergraduate degrees at UK Higher Education Institutions during the next three years. Multiple logistic regression was used to estimate associations with enrolment and dropout. 57% of cohort members enrolled. Household social class and housing tenure were most strongly associated with enrolment (those in rented accommodation were almost 70% less likely to enrol than those in the most expensive houses) followed by area level deprivation and long working hours outside school (both decreased enrolment). There were moderately strong associations with household car access, individual health status, religion and sex (males 27% less likely to enrol). Associations with household structure and rurality were much weaker. 6615 cohort members were observed for three years post-enrolment of whom 7.7% dropped out in the first year. Males were 39% more likely to dropout but in general associations between dropout and household or area factors were unclear. In conclusion, household background appears to have a more profound influence on enrolment than individual or area characteristics but little influence on dropout.