Collections containing a large number of short documents are becoming increasingly common. As these collections grow in number and size, providing effective retrieval of brief texts presents a significant research problem. We propose a novel approach to improving information retrieval (IR) for short texts based on aggressive document expansion. Starting from the hypothesis that short documents tend to be about a single topic, we submit documents as pseudo-queries and analyze the results to learn about the documents themselves. Document expansion helps in this context because short documents yield little in the way of term frequency information. However, as we show, the proposed technique helps us model not only lexical properties, but also temporal properties of documents. We present experimental results using a corpus of microblog (Twitter) data and a corpus of metadata records from a federated digital library. With respect to established baselines, results of these experiments show that applying our proposed document expansion method yields significant improvements in effectiveness. Specifically, our method improves the lexical representation of documents and the ability to let time influence retrieval.
The growth and evolution of digital scholarship in the humanities has produced new genres of scholarly work and publication, reliant upon new ways of representing and sharing evidence, analysis, and interpretation. Meanwhile, extant systems of scholarly communication, including publication, discovery, access-provision, maintenance, and preservation, too often exclude digital research products, to the potential detriment of the entire scholarly record. This paper considers one genre of digital humanities scholarship: the thematic research collection, a digital collection of primary sources gathered to support research on a theme. This genre is recognizable and increasingly common, yet wildly heterogeneous in precise form, function, and purpose. This typological analysis aims to identify and describe types of collections as a way toward comprehending the range, variation, and complexity of the whole genre. The research considers what thematic research collections are, how they work, and what challenges confront the provision of effective and ongoing access to digital scholarship.
To realize the great potential value of large‐scale digital libraries, we need a fuller understanding of the range of ways in which scholarly communities conduct research, or want to conduct research within them. Scholars build collections in the course of their work. How can we anticipate and support various kinds of collection‐building and ‐use, in order to support the diversity of researchers who work in libraries of digital books? This paper reports selected results of a study of how potential user groups of the HathiTrust Digital Library create and use collections in their research. This study aims to contribute to our broader understanding of scholarly practice, particularly of humanities scholars’ collecting activities. The results of the study inform ongoing work to develop a workset‐creation tool for the HathiTrust Research Center.
At present there are no established collection development methods for building large-scale digital aggregations. However, to realize the potential of the collective base of digital content and advance scholarship, aggregations must do more than provide search of sizable bodies of content. Informed by empirical understanding of scholarly information practices, the IMLS Digital Collections and Content project developed an aggregation strategy for building Opening History, one of the largest digital cultural heritage aggregations in the country. The strategy applied policy-driven collecting, based on the principle of contextual mass, and conspectus-style evaluation of collection-level metadata to identify strong subject areas within the aggregation. Analysis of density, interconnectedness, diversity, and small/large collection complementarity determined subject concentrations and thematic strengths to be prioritized for future collection development and used as organizational structures for browsing and visualization. The approach models how scholars build their own personal research collections, as they follow leads from collection to collection across institutions near and far, and adds value that cannot be achieved through conventional retrieval and browsing at the item-level.
In library and information science (LIS), research on interdisciplinarity is concerned with optimizing information resources, systems, and services for researchers working across disciplinary boundaries. Research libraries are responding to the rapid rise in interdisciplinary scholarship and the advances in digital content, technologies, and infrastructure that accompany an emerging data-intensive research paradigm. This chapter considers two key areas in LIS that inform current practice in research libraries—bibliometrics and information practices research. Bibliometric approaches investigate the patterns and flows of information among disciplines, and information practices research examines the activities and materials involved in the conduct of interdisciplinary work. As technical advances continue to solve problems in navigation and retrieval of information across disciplinary boundaries, the greatest challenge will be to assure the meaning and validity of newly created interdisciplinary knowledge through information systems that can sustain the increasingly long and mutable information paths back to our disciplinary intellectual foundations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.