Transcribe Bentham: Global community support harnessed to transcribe 13,532 documents (and counting...)
AHRC funding was used to develop Transcribe Bentham, a pioneering crowdsourced transcription initiative which enables non-academic users across the globe to participate in the transcription of the Bentham Papers, around 60,000 manuscript folios written and composed by the philosopher and reformer, Jeremy Bentham (1748–1832).
The manuscripts are of enormous international historical and philosophical importance and contain Bentham’s influential writings on numerous topics, ranging from politics, law, and religion, to sexuality and animal welfare. An ever-increasing amount of this internationally-significant archive continues to be made freely available online thanks to Transcribe Bentham, vastly increasing engagement with scholars and lay users alike.
In 1959 the Bentham Project was founded at UCL and began work upon the Collected Works of Jeremy Bentham, a critical scholarly edition of Bentham’s writings, based on both texts published in his lifetime and unpublished works which exist in manuscript. Having taken the role of Director of the Bentham Project in 2001, Professor Philip Schofield gained AHRC funding in 2003-6 to create a searchable catalogue of the archive’s content to facilitate this editorial work and to navigate the collection. In 2010-11 Professor Schofield received an AHRC award through which he, in collaboration with several other departments at UCL and with the University of London Computer Centre, developed Transcribe Bentham’s crowdsourced transcription platform: the Transcription Desk.
This tool was the first of its type to be used for the crowdsourcing of such complex manuscripts, and Transcribe Bentham itself was intended as a trial to evaluate whether or not non-specialist volunteers could read and decipher the handwriting, and deal with the manuscripts’ composition, and Bentham’s style and challenging ideas, to produce transcripts of a high enough standard to form the basis of editorial work, and for public searching in an online repository. Transcribe Bentham also tested whether or not non-specialists could add the required Text-Encoding Initiative-compliant XML (a de-facto standard for the display of electronic texts) to a high enough standard.
Alongside the tool, the team worked on outreach and developing easily understandable guides and tutorials for users, offering the opportunity to learn new skills and to engage a large new audience with the material. As of August 2015, 493 individuals had contributed to the project, more than rising to the challenging task posed to them. They had transcribed or partially-transcribed 13,532 transcripts, and 94% of these have been checked and approved by the project team as being of a high enough standard for editorial work and for displaying in UCL’s online Bentham Papers repository. The team also identified a small group of “super transcribers” within the volunteers, a group of twenty-six individuals who have contributed 96% of all transcripts submitted. Six of these ‘super transcribers’ have worked on over 1,000 transcripts each, and one individual has contributed more than 2,000 transcripts. It is estimated that if transcription continues at the current rate the entire Bentham Papers collection could be fully transcribed and available within 10 - 15 years. Had Transcribe Bentham not been developed, it is anticipated that the collection would not have been fully transcribed until 2085 at the earliest.
Once the tool proved to be an effective method of crowdsourcing for the transcription of Bentham’s manuscripts, further funding was secured from the Andrew W. Mellon Foundation’s ‘Scholarly Communications’ programme. This allowed the project team to implement improvements to the Transcription Desk requested by users, and to include the British Library’s Bentham manuscripts, thereby re-uniting his papers - digitally - for the first time since his death. The British Library’s manuscripts contain a great deal of Bentham family correspondence, which was a great help in recruiting several more ‘super transcribers’. The Transcription Desk (or parts thereof) was also adapted for use by other crowdsourcing projects, namely the Letters of 1916 project in Dublin and the Edvard Munch Archive in Oslo. Furthermore, transcripts produced by volunteers have been used as ‘ground truth’ training data for Handwritten Text Recognition (HTR) models for the European Commission-funded tranScriptorium project, and a modified Transcription Desk is being tested to evaluate how HTR can support users in crowdsourced transcription. Originally funded for three years, tranScriptorium has been so successful that the international consortium has gained a further three years of EU funding to expand the project group, and develop further ways to exploit this technology in a programme entitled Retrieval and Enrichment of Archival Documents (READ).
In addition to the recognition of Transcribe Bentham through further funding, the team also received an Award of Distinction in the Digital Communities category of the 2011 Prix Ars Electronica, the world’s foremost digital arts competition. Transcribe Bentham was also nominated for a Digital Heritage award in 2011, and came second in the Knetworks “Platform for Networked Innovation Competition” in 2012.
For more information on the project visit: www.ucl.ac.uk/Bentham-Project/transcribe_bentham
Gateway to Research Project Links: The Bentham Papers Transcription Initiative, Mar 10 - Apr 11