Transcribing documents from the printing press era, a challenge in its own right, is more complicated when documents interleave multiple languages-a common feature of 16th century texts. Additionally, many of these documents precede consistent orthographic conventions, making the task even harder. We extend the state-of-the-art historical OCR model of Berg-Kirkpatrick et al. (2013) to handle word-level code-switching between multiple languages. Further, we enable our system to handle spelling variability, including now-obsolete shorthand systems used by printers. Our results show average relative character error reductions of 14% across a variety of historical texts.
1Since mid-2005, archivist-activists at the Historical Archive of the National Police of Guatemala have been digitizing a century's worth of previously suppressed police records so as to protect, mobilize and provide access to them -23 million pages to date. We find that digitization amplified the staff's repurposing of the archive to serve victims of human rights violations. Digitization enhances short-and long-term safeguards for the archive's physical integrity, probative value and enduring accessibility, but has required critical human factors and institutional solidarity, most notably partnerships with international donors and allied organizations, and Guatemalan nongovernmental organizations. Finally, technology offers a lens to analyze the persistent challenges to promoting truth and justice in Guatemala. We show how simple, often ad hoc approaches to digitization developed under political urgency can have an irreversible impact when used to amplify a unified mission driven by a committed community of archival workers. throughout this project. We extend special thanks to our University of Texas (UT) colleagues, especially everyone on the LLILAS Benson digital initiatives team, as well as all involved in hosting the celebratory event between the AHPN and UT in July 2018. Finally, we are indebted to Kentaro Toyama and Kirsten Weld for valuable input and support during the research and writing process. We also thank the anonymous reviewers for thoughtful feedback.
Libraries Bibliography," which helped flesh out our literature 2 review. Everyone who contributed to the conversation about the ethics of teaching with digitized primary sources on Twitter,
LLILAS Benson Latin American Studies and Collections at the University of Texas at Austin applies post-custodial archival methods in pursuit of a new vision of digital archival practice and the transnational construction of historical memory. This work seeks to develop a practice for digital archiving that enables the redistribution of resources while centering communities as contributors and owners of their own documentary heritage. Although LLILAS Benson has successfully built partnerships and continues to manage widely recognized collections using a post-custodial model, the anti-colonial framework through which this work has been understood does not fully account for the power imbalances at play. Using Cifor and Lee’s survey of neoliberalism in the archives as a launching point, this article considers how neoliberalism has shaped post-custodial practices at LLILAS Benson, focusing on ideas and practices of labor, digitization, and the common good. Through this analysis, the authors describe not a static set of methodologies, but rather an ongoing process of learning, unlearning, and restructuring in pursuit of a collective good. Pre-print first published online 03/03/2018
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.