MY559 Half Unit
Computational Text Analysis and Large Language Models
This information is for the 2025/26 session.
Course Convenor
Ryan Hubert
Availability
This course is available on the MPhil/PhD in Economic Geography, MPhil/PhD in Environmental Economics, MPhil/PhD in International Relations, MPhil/PhD in Regional and Urban Planning Studies, MRes in Management (Employment Relations and Human Resources) and MRes in Management (Organisational Behaviour). This course is freely available as an outside option to students on other programmes where regulations permit. It does not require permission.
This course is available as an outside option to students on other programmes where regulations permit. This course is not controlled access. If you register for a place and meet the prerequisites, if any, you are likely to be given a place.
Requisites
Additional requisites:
Applied Regression Analysis (MY452) or equivalent is required. Students should have foundational mathematical proficiency through basic linear algebra. All students are required to complete the pre-sessional Digital Skills Lab course on programming (information will be available on the course Moodle page). Already knowing how to code in at least one programming language will be very helpful, but after completing the pre-sessional course, MY459 is suitable for students without coding experience. However, students in this situation should be prepared to invest additional time learning to code during the term.
Course content
This course introduces computational approaches to analysing text, emphasising how these methods can be used to investigate social phenomena and offering an overview of the tools that students can apply in academic research, policy analysis, or industry roles in data science. Students learn to quantify and empirically model texts using techniques such as: dictionary methods for measuring sentiment, topic modelling for extracting document topics, scaling for identifying ideological rhetoric, supervised classification for categorising large numbers of texts, word embeddings for modelling the contextual meaning of words, and large language models for a range of possible applications.
A central focus is on evaluating the validity of these methods—what they measure, how results should be interpreted, and what kinds of inferences they can support. To help build deeper conceptual understanding of how these methods enable learning about the social world, the course develops a range of applied technical skills, including programming, data wrangling, and the use of generative AI. Students gain practical experience through hands-on exercises designed to prepare them to use these techniques in academic research and beyond. By the end of the course, students will have developed a strong command of the logic and theory underpinning a wide range of computational techniques for analysing texts, be able to critically assess their uses and limits, and be equipped to apply them in real-world datasets.
Teaching
20 hours of lectures and 10 hours of seminars in the Winter Term.
This course has a reading week in Week 6 of Winter Term.
Formative assessment
Students will complete exercises during seminars and will complete one take home assignment during WT.
Indicative reading
Benoit, Kenneth (2020). “Text as Data: An Overview.” In Curini, Luigi and Robert Franzese, eds. Handbook of Research Methods in Political Science and International Relations. Sage Publications. pp. 461-497.
Grimmer, Justin, Margaret E. Roberts and Brandon M. Stewart (2022). Text as Data: A New Framework for Machine Learning and the Social Sciences. Princeton University Press.
Jurafsky, Daniel and James H. Martin (2024). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models. 3rd edition. https://web.stanford.edu/~jurafsky/slp3/
quanteda: An R package for quantitative text analysis. http://kbenoit.github.io/quanteda/
Assessment
Exam (100%), duration: 120 Minutes in the Spring exam period
Key facts
Department: Methodology
Course Study Period: Winter Term
Unit value: Half unit
FHEQ Level: Level 8
CEFR Level: Null
Total students 2024/25: 5
Average class size 2024/25: 2
Controlled access 2024/25: NoCourse selection videos
Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.