Skip to main content

Robust analysis of rating-scale data

Wednesday 10 December 2025

Dr Max Welz introduces research aiming to make statistical analyses robust against so-called ‘contamination’ in rating data stemming from low-quality survey responses.

Empirical research in the social, health, and economic sciences relies heavily on the statistical analysis of data measured on rating scales (close-ended survey questions that use a set of categories to measure a respondent's sentiment or perception on a quantitative or qualitative attribute). The results of such analyses have a resounding impact on society. For example, policymakers monitor public opinions via rating-scale data collected in surveys, psychologists collect rating-scale data in experiments to gain new insights into human behaviour, and companies let testers rate various aspects of a new product before launching it.

However, data quality is often an issue in rating-scale data, particularly when collected online. For instance, respondents may not always respond attentively or truthfully (a phenomenon known as ‘careless responding’), product ratings may not be genuine but created by bots (‘review bombing’), or responses may not be recorded correctly (‘data entry error’). Such aberrant responses can have devastating effects on the analysis of rating-data by creating statistical biases that may invalidate research results.

Person completing a paper survey form

My current work with statisticians Patrick Mair and Andreas Alfons (coauthors of ‘Robust Estimation of Polychoric Correlation’ (Welz et al, 2025)) addresses the issue of low-quality responses to rating-scales in a principled way. Specifically, we develop novel statistical tools designed to detect such responses and overcome their adverse effects. As such, our work aims to make statistical analyses robust against so-called ‘contamination’ in rating data stemming from low-quality responses. In particular, I propose the tool of C-estimators (Welz, 2024), which robustify a broad and general class of models for the analysis of rating-scale data, and, more generally, categorical data. In addition to their practical usefulness, C-estimators possess attractive mathematical and computational properties. For example, they retain the statistical power of existing non-robust methods and come at no additional computational cost. Both properties are highly unusual for robust statistical methods, which typically suffer from diminished power and intensified computational complexity.

As a particularly relevant special case, we apply C-estimation to robustly estimate the correlation between two rating-scale variables in questionnaires (Welz et al, 2025). In this context, we demonstrate how C-estimation remains insensitive to the damaging effects of careless responses while it simultaneously helps identify them. For instance, using empirical data from personality psychology, we show that standard methods for the correlation between the self-report questionnaire items ‘I am envious’ and ‘I am not envious’ is compromised by the presence of a small number of presumably careless respondents who strongly agree (or disagree) to both items. Such mutually contradictory response patterns are not expected in attentive respondents and may indicate carelessness. In contrast, despite not knowing the item labels, the robust C-estimator flags (and implicitly downweighs) such response patterns as ‘outlying’ because their response structure disagrees with that of the vast majority of (presumably attentive) respondents. Consequently, the robustly estimated correlation between ‘I am envious’ and ‘I am not envious’ amounts to -0.93, whereas the ordinary correlation estimate only amounts to about -0.62, which is unexpectedly weak considering that the two items are polar opposites, so one would expect a relatively strong negative correlation. The ensuing robustly estimated correlation matrix can then be used in subsequent analyses, such as (but not limited to) psychometric structural equation models (SEMs). We are currently working on robustifying SEM analyses in this way.

During a visit to the LSE Department of Statistics this past Autumn, I worked with Department professors Irini Moustaki and Yunxiao Chen on high-dimensional factor models for rating-scales, that is, data comprising of hundreds (possibly thousands) of rating-scale questionnaire items. In this collaboration, we attempted to apply C-estimators to robustly estimate such models by identifying and accounting for careless responding and other types of low-quality data points. The high-dimensional nature of this problem, however, poses a number of mathematical and computational challenges. Nevertheless, I was able to resolve these challenges during my visit, thanks in no small part to the great teamwork and cooperation with the Department.

My coauthors and I are committed to open science. Therefore, all of our manuscripts are publicly and freely available from preprint servers such as the ArXiv. In addition, all of our proposed methodology is implemented in open source software packages that are continuously extended and actively maintained. In particular, robust C-estimation is implemented in the R package ‘robcat’ (‘ROBust CATegorical data analysis’), which is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=robcat. Feel free to try it yourself!

By Dr Max Welz, University of Zurich.

REFERENCES

Welz, M., Mair, P., & Alfons, A. (2025). Robust estimation of polychoric correlation

Psychometrika (forthcoming in Psychometrika). https://doi.org/10.1017/psy.2025.10066.

Welz, M. (2024). Robust estimation and inference for categorical data [arXiv:2403.11954]. https://doi.org/10.48550/arXiv.2403.11954