As an increasing number of digital library projects embrace the harvesting of item-level descriptive metadata, issues of description granularity and concerns about potential loss of context when harvesting item-level metadata take on greater significance. Collection-level description can provide valuable context for item-level metadata records harvested from disparate and heterogeneous providers. This paper describes an ongoing experiment using collection-level description in concert with item-level metadata to improve quality of search and discovery across an aggregation of metadata describing resources held by a consortium of large academic research libraries. We present details of approaches implemented so far and preliminary analyses of the potential utility of these approaches. The paper concludes with a brief discussion of related issues and future work plans.
The HathiTrust Digital Library (HTDL) is a digital library containing about 14 million volumes which comprise billions of pages of content. The HathiTrust Research Center (HTRC) is a collaborative research initiative jointly led by Indiana University and the University of Illinois at Urbana-Champaign. This paper describes the development of a collections data model by the Workset Creation for Scholarly Analysis project, a HTRC research initiative funded by the Andrew W. Mellon Foundation. The resulting HTRC Workset data model is designed to aid humanities scholars by helping them to describe selected portions of the HTDL corpus that serve as the objects of their research. The resulting worksets are persistent, citable, and can be assessed by other scholars for reuse in additional research processes.
Linked Data provides a conceptual foundation for creating unified views across Digital Libraries, but implementation challenges must be overcome to realize the vision of computationally assisted cross‐corpus research. We report practical experiences comparing two alternative workset building approaches across combined datasets: the HathiTrust Digital Library and the Early English Books Online Text Creation Partnership. In one experiment we combine both datasets within one triplestore using a single ontology and apply consolidated querying; in the other we build two distributed triplestores, each dataset conforming to its own ontology, and connected through federated querying. Each solution presents tradeoffs in complexity, system efficiency and responsiveness, and in the workload of configuring new methods providing access to Digital Libraries. We demonstrate that choosing a consolidated or federated approach fundamentally alters the dataset configuration process for cross‐corpora workset building, so should be considered early in deployment specification and design. As both approaches provide equivalent functionality to the end‐user, the practice and experience documented here inform design and development of distributed Linked Data Digital Libraries offering combined collection querying.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.