KEYWORDS Drupal, libraries, linked data, linked open data, Semantic WebThe Smithsonian Libraries is the world's largest museum library system, with 22 physical locations, more than 1.9 million collection items, and a website that receives more than 1.2 million unique visitors per year. Its primary focus is supporting the research of the scientists, curators, and museum specialists who work at the Smithsonian, whose fields of study range from anthropology and American art to the history of technology and zoology. The system has, however, always had a robust Web presence that targets a much broader audience and seeks to make its collections available to researchers © Keri Thompson and Joel Richard This paper assumes that the reader will have some familiarity with the concepts of linked data as described by Heath and Bizer (2011) and a general awareness of the terminology and concepts behind RDF, triplestores, and RDFa (RDF in attributes), as described by Manola and Miller (2004).
As the world’s largest open access digital library for biodiversity literature and archives, the Biodiversity Heritage Library (BHL) provides access to over a quarter-million volumes of natural history literature to researchers around the world. One of its services is to index taxonomic names in the collection to allow researchers to locate publications about specific taxa. The Global Names Architecture (GNA) is a system of web services to register, find, index, check and organize biological scientific names. GNA recently developed a new Name Finding algorithm and tool that has been integrated with BHL to improve taxonomic name searches within BHL. In our presentation, we will discuss a brief history of name finding in BHL, development of the Name Finding algorithm, results from implementing the algorithm, and challenges that still await us in the realm of taxonomic name finding in BHL.
The Biodiversity Heritage Library (BHL) will soon upload its 60 millionth page of open access biodiversity literature onto the BHL website and the BHL's Internet Archive Collection. The BHL’s massive repository of free knowledge includes content that is available nowhere else online, as well as accessible versions of content that are locked behind paywalls elsewhere. If we are to continue to expand our understanding of life on Earth, we must ensure that the foundation of biodiversity knowledge provided by BHL is discoverable by the tools we rely on to navigate the ever-expanding internet. These tools – search engines and their algorithms – preferentially deliver (and rank) content with good metadata and persistent identifiers (PIDs). In modern online publishing, PID assignment and linking happens at the point of publication: DOIs (Digital Object Identifiers) for publications, ORCIDs (Open Researcher and Contributor IDs) for people, and RORs (Research Organization Registry IDs) for organisations. The DOI system provided by Crossref (the DOI registration agency for scholarly content) delivers reciprocal citations, enabling convenient clicking from article to article, and citation tracking, enabling authors and institutions to track the impact and reach of their research output. Publications that lack PIDs, which include the vast majority of legacy literature, are hard to find and sit outside the linked network of scholarly research. This makes it nearly impossible to determine whether they are being cited, let alone viewed, mentioned, shared or liked. At TDWG 2020, Page 2020, Kearney 2020, Richard 2020 (and 2019, Page 2019b, Page 2019a, Kearney 2019b, Kearney 2019a and 2018, Kearney 2018), we emphasised the need to bring the historic biodiversity literature into the modern linked network of scholarly research. In October 2020, BHL launched a new working group to do exactly this. The BHL Persistent Identifier Working Group (Team #RetroPID) brings together expertise from across BHL’s global community. Over the past year, we have worked tirelessly to make it easier to find, cite, link, share and track the content on BHL, adding article-level metadata to journals and retrospectively assigning DOIs (#RetroPIDs). Most importantly, we have developed the tools and documentation that will enable the entire BHL community to take contributed content from “just” accessible to persistently discoverable. This paper will detail our efforts to retrofit the historic literature (a square peg) into the modern PID system (a round hole) and will present both the achievements and the challenges of this important work.
In 1996 Smithsonian Libraries (SIL) embarked on the digitization of its collections. By 1999, a full-scale digitization center was in place and rare volumes from the natural history collections, often of high illustrative value, were the focus for the first years of the program. The resulting beautiful books made available for online display were successful to a certain extent, but it soon became clear that the data locked within the texts needed to be converted to more usable and re-purposable form via digitization methods that went beyond simple page imaging and included text conversion elements. Library staff met with researchers from the taxonomic community to understand their path to the literature and identified tools (indexes and bibliographies) used to connect to the library holdings. The traditional library metadata describing the titles, which made them easily retrievable from the shelves of libraries, was not meeting the needs of the researcher looking for more detailed and granular data within the texts. The result was to identify proper print tools that could potential assist researchers in digital form. This paper outlines the project undertaken to convert Charles Davies Sherborn's Index Animalium into a tool to connect researchers to the library holdings: from a print index to a database to eventually a dataset.Sherborn's microcitation of a species name and his bibliographies help bridge the gap between taxonomist and literature holdings of libraries. In 2004, SIL received funding from the Smithsonian's Atherton Seidell Endowment to create an online version of Sherborn's Index Animalium. The initial project was to digitize the page images and re-key the data into a simple data structure. As the project evolved, a more complex database was developed which enabled quality field searching to retrieve species names and to search the bibliography. Problems with inconsistent abbreviations and styling of his bibliographies made the parsing of the data difficult. Coinciding with the development of the Biodiversity Heritage Library
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.