Social Statistics

Anyone can make a theory or a claim about anything but the important thing is can you provide real world evidence to back up and support that claim and that really is where Statistics comes into its element

Research in Social Statistics is concerned with the development of statistical methods that can be used across the social and human sciences. Statisticians play an essential role in all aspects of social inquiry, including study design, measurement, data linkage, development of statistical models that account for the complex structure of social data, model selection and evaluation, and modelling, analysis and interpretation of the data to answer substantive research questions. 

Members of the Social Statistics group have interests in statistical methods in each of these areas. They regularly collaborate with social scientists whose questions motivate new lines of methodological research. We have experience in a range of social science disciplines, including demography, education, epidemiology, psychology and sociology.

Research areas

Members of the Social Statistics group conduct research in many areas of statistical theory and methods that are important for answering research questions in the modern social sciences. 

The methods that we research are relevant for research questions that can be of many types, including questions about out-of-sample prediction based on complex data, description of population relationships using data from surveys and other sources, or causal inference from experimental or observational studies using approaches such as regression discontinuity, interrupted time series and synthetic and negative control designs. 

Data in these applications are often complex, high-dimensional and challenging to analyse. We develop methods that can cope with this complexity, for example:  multivariate analysis of high-dimensional data; analysis of clustered data with complex correlation structures such as multivariate longitudinal data and multiprocess survival data; detection of outliers; analysis of problems with missing data, drop-out, misclassification and measurement error; dealing with non-informative missingness in the presence of time-varying confounding in causal inference; and combining data from multiple different sources. 

We develop and employ various statistical frameworks, models, methods of estimation, and computational algorithms. These include: different types of latent variable, mixture and random effects models for continuous and categorical variables; Gaussian processes; interpretable machine learning methods; marginal modelling; composite likelihood methods; models for dependence using reproducing kernel Hilbert space methods; Markov decision and reinforcement learning methods; Bayesian methods; and computationally efficient Markov Chain Monte Carlo and sequential Monte Carlo computational techniques to facilitate parameter estimation, statistical inference, model choice, and prediction.


The statistical methods that are studied and developed by members of the Social Statistics group can be used in substantive research in the social and human sciences, business and policy. We work on such applications in collaboration with researchers in criminology, demography, education, epidemiology, political science, psychology, social policy, and sociology, as well as researchers outside academia. Many of these projects are funded by research grants from external funders such as the Economic and Social Research Council and the Wellcome Trust. 

Areas of application that we have worked on include international large-scale assessments in education; methods of election polling; effects of statin prescription in the general population; public attitudes to the police; longitudinal analysis of exchanges of financial and practical support between parents and their adult children; role of education in social class mobility; resource allocation and sequential decision in crowdsourcing platforms; use of stochastic epidemic models on the infectious diseases of covid-19, influenza, HIV and sheeppox; estimation of the prevalence of problem gambling; cheating detection in educational tests; effect of changing sentencing guidelines on sentence severity; ethnic diversity and social cohesion; associations between change in beliefs and mood following cardiac surgery and subsequent attendance at outpatient rehabilitation; sequential design of personalised learning systems; safety citizenship behaviour in organizations; impact of austerity measures on mental health, in particular of ethnic minority communities in London; and exit poll forecasting of election results.

Selected publications

Bakk, Z. and Kuha, J. (2018). Two-step estimation of models between latent classes and external variables. Psychometrika, 83, 871-892. 

Bergsma, Wicher (2020). Regression with I-priors. Econometrics and Statistics, 14, 89 - 111. 

Chen, Y., Lee, Y.-H., and Li, X. (2021). Item quality control in educational testing: Change point model, compound risk, and sequential detection. To appear in Journal of Educational and Behavioral Statistics.  

Chen, Y. and Li, X. (2021). Determining the number of factors in high-dimensional generalised latent factor models. To appear in Biometrika. 

Doretti, M., Geneletti, S., and Stanghellini, E. (2017). Missing data: A unified taxonomy guided by conditional independence. International Statistical Review, 86, 189-204. 

Dureau, J., Kalogeropoulos, K., Vickerman, P., Pickles, M., and Boily, M. C. (2016).  A Bayesian approach to estimate changes in condom use from limited human immunodeficiency virus prevalence data. Journal of the Royal Statistical Society, Series C, 65, 237 - 257. 

Geminiani, E., Marra, G., and Moustaki, I. (2021). Single and multiple-group penalized factor analysis: a trust-region algorithm approach with integrated automatic multiple tuning parameter selection. Psychometrika, 86, 65 - 95. 

Geneletti, S., Ricciardi, F., O’Keeffe, A. G., and Baio, G. (2019). Bayesian modelling for binary outcomes in the regression discontinuity design. Journal of the Royal Statistical Society, Series A, 182, 983 - 1002. 

Katsikatsou, M., Moustaki, I., and Jamil, H. (2022). Pairwise likelihood estimation for confirmatory factor analysis models with categorical variables and data that are missing at random. British Journal of Mathematical and Statistical Psychology, 75, 23 - 45. 

Kuha, J., Bukodi, E., and Goldthorpe, J. H. (2021). Mediation analysis for associations of categorical variables: The role of education in social class mobility in Britain. Annals of Applied Statistics, 15, 2061-2082. 

Malesios, C, Demiris, N, Kalogeropoulos, K., and Ntzoufras, I (2017). Bayesian epidemic models for spatially aggregated count data. Statistics in Medicine, 36, 3216-3230. 

Shi, C., Xu, T., Bergsma, W., and Li, L. (2021). Double generative adversarial networks for conditional independence testing. Journal of Machine Learning Research, 22, 1-32. 

Steele, F., Clarke, P.S., and Kuha, J. (2019). Modeling within-household associations in household panel studies.  Annals of Applied Statistics, 13,  367-392. 

Steele, F. and Grundy, E. (2021). Random effects dynamic panel models for unequally-spaced multivariate categorical repeated measures: an application to child-parent exchanges of support. Journal of the Royal Statistical Society, Series C, 70, 3-23. 

Academic and research staff


Wicher Bergsma - Professor

Research interests: Reproducing kernels; dependence modelling; graphical models; I-priors; categorical data.


Yunxiao Chen - Assistant Professor

Research interests: Latent variable models; high-dimensional multivariate analysis; empirical Bayes; process data sequential decision.

Sara G

Sara Geneletti - Associate Professor

Research interests: Causal inference; natural experiments; Bayesian methods; synthetic controls; graphical models.

Kostas Kal new1

Kostas Kalogeropoulos - Associate Professor

Research interests: Bayesian inference; stochastic epidemic modelling; factor analysis; sequential learning. 

Jouni_Kuha 2021_2

Jouni Kuha - Professor

Research interests: Categorical data; incomplete data problems; latent variable modelling; survey data analysis.

Prof Irini Moustaki200x200

Irini Moustaki - Professor

Research interests: Latent variable and structural equation models; estimation methods; treatment of missing values; outlier detection.


Fiona Steele - Professor

Research interests: Multilevel modelling; longitudinal data analysis; event history analysis; multivariate analysis.

Research students

Zackary Allinson 2023

Zackary Allinson

Research interests:

Sze Ming Lee 2022

Sze Ming Lee

Research interests: Large-scale data analysis, latent variable modelling, survival analysis and quantile regression.


Xinyi Liu 2022

Xinyi Liu

Research interests:

Pouya Mirrezaeiroudaki 2023 web

Seyedpouya Mirrezaeiroudaki

Research interests:


Motonori Oka 2022

Motonori Oka

Research interests: Statistics and machine/deep learning for the education and social sciences

Motonori Oka’s research interests lie broadly in statistics and machine/deep learning for the education and social sciences, with specific devotion to methodological advancement in latent variable modeling that contributes to the better understanding of the complex phenomena behind human behaviors. The thrust of his research has been centered on scalable psychometrics, education data science, and statistics education. In particular, he is interested in scalable estimation algorithms and their theory of latent factor models. He holds a BA in Psychology from University of Tsukuba and an MA in Education from the University of Tokyo.

Zhichao Shen

August Shen

Research interests: