Hacking The Archive

This information is for the 2014/15 session.

Teacher responsible

Professor Matthew Connelly has been appointed as Philippe Roman Chair in History and International Affairs at LSE IDEAS. Dr Daniel Krasner will be a guest lecturer on the course.


This course is available on the MSc in Empires, Colonialism and Globalisation, MSc in History of International Relations, MSc in International Affairs (LSE and Peking University), MSc in International and World History (LSE & Columbia) and MSc in Theory and History of International Relations. This course is available with permission as an outside option to students on other programmes where regulations permit.


A prior knowledge of 20th century International History will be an advantage. Students unfamiliar with the subject should do some preliminary reading

Course content

Historians now have access to unprecedentedly large and rich bodies of information generated from the digitization of older materials and the explosion of “born digital” electronic records. Machine learning and natural language processing make it possible to answer traditional research questions with greater rigor, and tackle new kinds of projects that would once have been deemed impracticable. At the same time, scholars now have many more ways to communicate with one another and the broader public, and it is becoming both easier – and more necessary – to collaborate across disciplines.

This course aims to create a laboratory organized around a common group of databases in international history which can be used for multiple research projects. Students will begin by learning about earlier methodological transformations in literary, cultural, and historical analysis, and consider whether and how the “digital turn” might turn out differently. They will then explore new tools and techniques, including named-entity extraction, text classification, topic modeling, geographic information systems, social and citation network analysis, and data visualization.

As we turn to specific research projects, you will be able to either experiment with a new approach or write a history paper in support of your eventual dissertation or work in groups to take on a larger challenge. Individual experiments and papers could entail applying one or more of the tools that are being developed for the exploration of large collections of digitized or born-digital documents to specific historical controversies or novel historical topics. Group projects might include assembling and “cleaning” a large dataset of documents, prototyping an interactive research tool, or launching a web-based exhibit. You will be encouraged to seek out additional training, conduct experiments, and design ambitious projects that would extend beyond the life of the course.

The course is open to students with no training in statistics or computer programming. But all participants should be open to learning the basics of scraping websites, obtaining data through an API, querying databases, and using data visualization tools. During the break between Michaelmas and Lent terms, students will be encouraged to take advantage of the many free on-line courses and introductory texts in computer programming.


16 hours of seminars in the MT. 16 hours of seminars in the LT.

The seminar will meet twice weekly during the Michaelmas term every other week. Tuesday meetings will consist of a seminar discussion of common readings. On Wednesdays class will begin with training in new tools for historical research, and transition to individual and small group work. Students will begin to develop a project to use these tools in their own research, either individually or as part of a group that will aim to produce a new dataset, tool, or web exhibit.

In Lent term the seminar will resume meeting twice weekly every other week. The first week (January 17) will include a full-day workshop in which students will have the option of presenting a draft project description to experts in international history and computational methods. The experts will offer advice and assistance in refining their proposals and making any necessary mid-course corrections. Students will develop their papers and projects over Lent term and reconvene to present their projects and develop a plan for follow-up research.

Formative coursework

Formative coursework will consist of a project description of 6,000 words. Students will be encouraged, but not required, to submit a draft project description at the beginning of Lent term. The final version will include a statement of the research question, review of the literature, discussion of the data, research design, and an implementation plan. Students will have the option of joining with one or more colleagues in submitting a group project description. A group project may entail a “proof of concept,” such as a cleaned and curated digital archive, a functioning prototype tool, or a web-based exhibit.

Indicative reading

Erez Lieberman Aiden, et al.,  "Quantitative Analysis of Culture Using Millions of Digitized Books" 

David Allen and Matthew Connelly,”Diplomatic History After the Big Bang: Using Computational Methods to Explore the Infinite Archive,” in Frank Costigliola and Michael Hogan, eds., Explaining the History of American Foreign Relations (forthcoming)

David Blei, “Probabilistic Topic Models,” Communications of the ACM 55

Daniel Cohen and Roy Rosenzweig, Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web (http://chnm.gmu.edu/digitalhistory/book.php)

Geoff Cunfer, On the Great Plains (College Station, 2005)

“Interchange: The Promise of Digital History,” Journal of American History 95 (September 2008): TK

Matthew Jockers, Macoanalysis (Urbana: University of Illinois Press, 2013).

William McAllister, “The Documentary Big Bang, the Digital Records Revolution, and the Future of the Historical Profession,” Passport 41:2 (September 2010): 12-17.

Franco Moretti, Graphs, Maps, Trees

Roy Rosenzweig, “Scarcity or Abundance? Preserving the Past in a Digital Era,” American Historical Review 108 (2003): 735-762

  • Statement of Thomas Blanton, Director, National Security Archive, George Washington University To the Committee on the Judiciary U.S. House of Representatives Hearing on the Espionage Act and the Legal and Constitutional Implications of Wikileaks
  • Vannevar Bush, "As We May Think"
  • Franco Moretti, “Operationalizing: Or, the Function of Measurement in Literary Theory,” New Left Review 84 (November-December 2013)
  • F. Mosteller and D. Wallace, “Inference in an Authorship Problem”
  • Ted Nelson, The Tyranny of the File
  • Lawrence Page and Sergey Brin, "The PageRank Citation Ranking: Bringing Order to the Web"
  • David Pozen, “Deep Secrecy”
  • Stephen Ramsay, "Databases"
  • Marc Trachtenberg, Declassification Analysis
  • Joshua Sternfeld, “Archival Theory and Digital Historiography: Selection, Search, and Metadata as Archival Processes for Assessing Historical Contextualization,” The American Archivist 74 (2011): 544-575
  • Kate Theimer, “Archives in Context and as Context,” Journal of Digital Humanities Vol. 1, No. 2 (Spring 2012)


This course is not assessed by the School.

Key facts

Department: International History

Total students 2013/14: Unavailable

Average class size 2013/14: Unavailable

Controlled access 2013/14: No

Lecture capture used 2013/14: No

Value: Non-assessed

Guidelines for interpreting course guide information

Personal development skills

  • Leadership
  • Self-management
  • Team working
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Commercial awareness
  • Specialist skills