Purpose -The purpose of this paper is to address the most urgent challenges that libraries face in the mass digitization of historical printed text: the unsatisfactory result of the conversion of scanned images to full featured electronic text by means of automated optical character recognition (OCR); the historical language barrier around 1850, caused by inadequacy of most existing lexica for historical language for OCR or post-correction and a lack of institutional knowledge and expertise in libraries, museums and archives. Design/methodology/approach -In the EC-funded project IMPACT (Improving Access to Text), seven libraries, six research institutes and two private sector companies across Europe work together to address the challenges by the development of OCR software and technologies which exceed the accurateness of current state-of-the-art software significantly. The IMPACT solutions focus on the entire process of recognition after the document leaves the scanner: Image processing, OCR processing (including use of dictionaries), OCR correction and Document formatting. IMPACT will also build capacity in mass digitization by sharing best practice and expertise with the cultural heritage communities in Europe. Findings -Technical results will include toolkits for image enhancement and segmentation, an adaptive OCR engine and several prototypes of experimental OCR engines, computational lexica and several post-correction modules including a web based collaborative correction system and a parser for structural metadata. Strategic tools include several decision support tools, guidelines, a web site with demonstrator platform, a training programme and ultimately, a sustainable Centre of Competence for mass digitization in Europe. Originality/value -The IMPACT solutions will allow for the first time to transform large amounts of digitized historical texts into electronic text with a minimum of manual interference and a significantly improved accessibility for the user.
The National Library of the Netherlands, (Koninklijke Bibliotheek, hereafter KB), has been innovating its services and organization for the past 20 years and expects to continue to do so in the future. The central question in this article is: what makes innovation work in the organization of the KB? We will focus on two use cases: the development of the recently opened Delpher portal, giving access to 30 million pages of digitized Dutch heritage, and the current development of the KB ResearchLab that gives internal and external researchers a platform for experiments. A review of innovation theory and practice (Balk 2013) provides us with a checklist of factors that determine the innovation capacity of a library, grouped in four themes: Leadership and culture, Knowledge and organizational learning, Collaboration capacity and Organizational design. By applying this innovation checklist to the use cases discussed, we hope to contribute to the body of best practice in innovation in national libraries. Finally, we will look ahead at the development of the National Digital Library of the Netherlands, integrating services for the public library community into the KB in the near future and share some potential scenarios for the future of the library landscape in the Netherlands with the audience.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.