Abstract-TRANSCRIPTORIUM is a 3-years project that aims to develop innovative, cost-effective solutions for the indexing, search and full transcription of historical handwritten document images, using Handwritten Text Recognition (HTR) technology. The production of ground-truth (GT) of a dataset of handwritten document images is among the first tasks. We address novel approaches for the faster production of this GT based on crowdsourcing and on prior-knowledge methods. We also address here a novel low-cost semi-supervised procedure for obtaining pairs of correct line-level aligned detected/extracted text line images and text line transcripts, specially suitable for training models of the HTR technology employed in TRANSCRIPTORIUM.
The Bentham Papers Transcription Initiative 2 (Transcribe Bentham for short) is an award-winning crowdsourced manuscript transcription initiative which engages students, researchers, and the general public with the thought and life of the philosopher and reformer, Jeremy Bentham (1748-1832), by making available digital images of his manuscripts for anyone, anywhere in the world, to transcribe. Since its launch in September 2010, over 2.6 million words have been transcribed by volunteers. This paper will examine Transcribe Bentham's contribution to humanities research and the burgeoning field of digital humanities. It will then discuss the potential for the project's volunteers to make significant new discoveries among the vast Bentham Papers collection, and examine several examples of interesting material transcribed by volunteers thus far. We demonstrate here that a crowd-sourced initiative such as Transcribe Bentham can open up activities that were traditionally viewed as academic endeavors to a wider audience interested in history, whilst uncovering new, important historical primary source material. In addition, we see this as a switch in focus for those involved in digital humanities, highlighting the possibilities in using online and social media technologies for user engagement and participation in cultural heritage.
In recent years, important research on crowdsourcing in the cultural heritage sector has been published, dealing with topics such as the quantity of contributions made by volunteers, the motivations of those who participate in such projects, the design and establishment of crowdsourcing initiatives, and their public engagement value. This article addresses a gap in the literature, and seeks to answer two key questions in relation to crowdsourced transcription: (1) whether volunteers' contributions are of a high enough standard for creating a publicly accessible database, and for use in scholarly research; and (2) if crowdsourced transcription makes economic sense, and if the investment in launching and running such a project can ever pay off. In doing so, this article takes the award-winning crowdsourced transcription initiative, Transcribe Bentham, which began in 2010, as its case study. It examines a large data set, namely, 4,364 checked and approved transcripts submitted by volunteers between 1 October 2012 and 27 June 2014. These data include metrics such as the time taken to check and approve each transcript, and the number of alterations made to the transcript by Transcribe Bentham staff. These data are then used to evaluate the long-term cost-effectiveness of the initiative, and its potential impact upon the ongoing production of The Collected Works of Jeremy Bentham at UCL. Finally, the article proposes more general points about successfully planning humanities crowdsourcing projects, and provides a framework in which both the quality of their outputs and the efficiencies of their cost structures can be evaluated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.