MY459 Half Unit
Special Topics in Quantitative Analysis: Quantitative Text Analysis
This information is for the 2020/21 session.
Dr Blake Miller COL.7.14
This course is available on the MSc in Applied Social Data Science, MSc in Data Science, MSc in Human Geography and Urban Studies (Research), MSc in Political Science and Political Economy, MSc in Social Research Methods, MSc in Statistics, MSc in Statistics (Financial Statistics), MSc in Statistics (Financial Statistics) (LSE and Fudan), MSc in Statistics (Financial Statistics) (Research), MSc in Statistics (Research), MSc in Statistics (Social Statistics) and MSc in Statistics (Social Statistics) (Research). This course is available with permission as an outside option to students on other programmes where regulations permit.
The course is also available to research students as MY559.
Students must have completed Applied Regression Analysis (MY452).
The course surveys methods for systematically extracting quantitative information from text for social scientific purposes, starting with classical content analysis and dictionary-based methods, to classification methods, and state-of-the-art scaling methods and topic models for estimating quantities from text using statistical techniques. The course lays a theoretical foundation for text analysis but mainly takes a very practical and applied approach, so that students learn how to apply these methods in actual research. The common focus across all methods is that they can be reduced to a three-step process: first, identifying texts and units of texts for analysis; second, extracting from the texts quantitatively measured features - such as coded content categories, word counts, word types, dictionary counts, or parts of speech - and converting these into a quantitative matrix; and third, using quantitative or statistical methods to analyse this matrix in order to generate inferences about the texts or their authors. The course systematically surveys these methods in a logical progression, with a practical, hands-on approach where each technique will be applied using appropriate software to real texts.
Lectures, class exercises and homework will be based on the use of the R statistical software package but will assume no background knowledge of that language.
This course is delivered through a combination of classes and lectures totalling a minimum of 20 hours across Lent Term. This year, some or all of this teaching may be delivered through a combination of virtual classes and flipped-lectures delivered as short online videos.
This course has a reading week in Week 6 of LT.
Exercises from the computer classes can be submitted for marking.
quanteda: An R package for quantitative text analysis. http://kbenoit.github.io/quanteda/
Grimmer, Justin and Brandon M. Stewart. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21(3):267–297.
Loughran, Tim and Bill McDonald. 2011. “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks.” The Journal of Finance 66(1, February): 35–65.
Evans, Michael, Wayne McIntosh, Jimmy Lin and Cynthia Cates. 2007. “Recounting the Courts? Applying Automated Content Analysis to Enhance Empirical Legal Research.” Journal of Empirical Legal Studies 4(4, December):1007–1039.
Project (40%, 3000 words) in the ST.
Coursework (60%, 2000 words) in the LT.
Important information in response to COVID-19
Please note that during 2020/21 academic year some variation to teaching and learning activities may be required to respond to changes in public health advice and/or to account for the situation of students in attendance on campus and those studying online during the early part of the academic year. For assessment, this may involve changes to mode of delivery and/or the format or weighting of assessments. Changes will only be made if required and students will be notified about any changes to teaching or assessment plans at the earliest opportunity.
Total students 2019/20: 43
Average class size 2019/20: 21
Controlled access 2019/20: No
Value: Half Unit
Personal development skills
- Problem solving
- Application of numeracy skills