Not available in 2022/23
MY360 Half Unit
Quantitative Text Analysis
This information is for the 2022/23 session.
Dr Blake Miller and Dr Friedrich Geiecke
This course is available on the BSc in Politics and Data Science. This course is available as an outside option to students on other programmes where regulations permit and to General Course students.
Knowledge of statistics and probability to the level of ST107 or equivalent.
The course surveys methods for systematically extracting quantitative information from text for social scientific purposes, starting with classical content analysis and dictionary-based methods, classification methods, and state-of-the-art scaling methods. It continues with probabilistic topic models, word embeddings, and concludes with an outlook on current neural network based models for texts. The course lays a theoretical foundation for text analysis but mainly takes a very practical and applied approach, so that students learn how to apply these methods in actual research. A common focus across methods is that they can be reduced to a three-step process: first, identifying texts and units of texts for analysis; second, extracting from the texts quantitatively measured features - such as coded content categories, word counts, word types, dictionary counts, or parts of speech - and converting these into a quantitative matrix; and third, using quantitative or statistical methods to analyse this matrix in order to generate inferences about the texts or their authors. The course systematically surveys these methods in a logical progression, with a practical, hands-on approach where each technique will be applied using appropriate software to real texts.
Lectures, class exercises and homework will be based on the use of the R statistical software package but will assume no background knowledge of that language.
A combination of classes and lectures totalling a minimum of 30 hours across Lent Term. This course has a Reading Week in Week 6 of LT.
One problem set in LT.
quanteda: An R package for quantitative text analysis. http://kbenoit.github.io/quanteda/
Benoit, Kenneth. 2020. “Text as Data: An Overview.” In Curini, Luigi and Robert Franzese, eds. Handbook of Research Methods in Political Science and International Relations. Thousand Oaks: Sage. pp461-497.
Grimmer, Justin and Brandon M. Stewart. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21(3):267–297.
Loughran, Tim and Bill McDonald. 2011. “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks.” The Journal of Finance 66(1, February): 35–65.
Evans, Michael, Wayne McIntosh, Jimmy Lin and Cynthia Cates. 2007. “Recounting the Courts? Applying Automated Content Analysis to Enhance Empirical Legal Research.” Journal of Empirical Legal Studies 4(4, December):1007–1039.
Project (20%) and group project (20%) in the ST.
Problem sets (60%) in the LT.
Four summative problem sets will be marked in four of the weeks. These will constitute 60% of the final overall mark. The group project will be an original analysis of texts using some of the methods covered in class, and may focus on replicating or extending a published work, written up as a report. 20% of the final overall mark will based on the subsection of the group report written by the student, and 20% of the final overall mark will be based on the collectively written sections of the group report.
Total students 2021/22: Unavailable
Average class size 2021/22: Unavailable
Capped 2021/22: No
Value: Half Unit
Course selection videos
Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.
Personal development skills
- Team working
- Problem solving
- Application of information skills
- Application of numeracy skills
- Specialist skills