The Biodiversity Heritage Library (BHL) will soon upload its 60 millionth page of open access biodiversity literature onto the BHL website and the BHL's Internet Archive Collection. The BHL’s massive repository of free knowledge includes content that is available nowhere else online, as well as accessible versions of content that are locked behind paywalls elsewhere. If we are to continue to expand our understanding of life on Earth, we must ensure that the foundation of biodiversity knowledge provided by BHL is discoverable by the tools we rely on to navigate the ever-expanding internet. These tools – search engines and their algorithms – preferentially deliver (and rank) content with good metadata and persistent identifiers (PIDs). In modern online publishing, PID assignment and linking happens at the point of publication: DOIs (Digital Object Identifiers) for publications, ORCIDs (Open Researcher and Contributor IDs) for people, and RORs (Research Organization Registry IDs) for organisations. The DOI system provided by Crossref (the DOI registration agency for scholarly content) delivers reciprocal citations, enabling convenient clicking from article to article, and citation tracking, enabling authors and institutions to track the impact and reach of their research output. Publications that lack PIDs, which include the vast majority of legacy literature, are hard to find and sit outside the linked network of scholarly research. This makes it nearly impossible to determine whether they are being cited, let alone viewed, mentioned, shared or liked. At TDWG 2020, Page 2020, Kearney 2020, Richard 2020 (and 2019, Page 2019b, Page 2019a, Kearney 2019b, Kearney 2019a and 2018, Kearney 2018), we emphasised the need to bring the historic biodiversity literature into the modern linked network of scholarly research. In October 2020, BHL launched a new working group to do exactly this. The BHL Persistent Identifier Working Group (Team #RetroPID) brings together expertise from across BHL’s global community. Over the past year, we have worked tirelessly to make it easier to find, cite, link, share and track the content on BHL, adding article-level metadata to journals and retrospectively assigning DOIs (#RetroPIDs). Most importantly, we have developed the tools and documentation that will enable the entire BHL community to take contributed content from “just” accessible to persistently discoverable. This paper will detail our efforts to retrofit the historic literature (a square peg) into the modern PID system (a round hole) and will present both the achievements and the challenges of this important work.
Over the last two decades, libraries and archives of natural history museums and botanical gardens in the US have spent major efforts to digitize their holdings. However, transporting these digitized resources from individual repositories to a wider community of researchers is challenging. Many of the primary resources are handwritten which limits their use and reuse because cursive writing and personal shorthand are difficult to decipher and the documents mostly lack machine readable data. This paper presents three case studies from the Harvard University Herbaria (HUH) Botany Libraries and the Harvard University Ernst Mayr Library and Archives (EMLA) of the Museum of Comparative Zoology (MCZ) that utilize crowd-sourcing, detailed access and discovery tools, and open access platforms to make handwritten materials more accessible to researchers by bridging content across collections held within and outside of Harvard University. The case studies show that different approaches can yield opportunities for mining data because transcription of handwritten documents and enhanced metadata allows searching previously unavailable words and phrases such as taxonomic names. Content contributed to the Biodiversity Heritage Library (BHL) and the tools and services available in the BHL were integral to the work. The end result shows how information held in natural history libraries and archives contributes to the expansion of scientific and cultural historical knowledge by increasing access to previously unavailable historical scientific information through digitization, metadata enhancement and transcription.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.