“Context” is an elusive concept in Information Science –often invoked, and yet rarely explained. In this paper we take a domain analytic approach to examine five sub‐disciplines within Earth Systems Science to show how the contexts of data production and use impact the value of data. We argue simply that the value of research data increases with their use. Our analysis is informed by two economic perspectives: first, that data production needs to be situated within a broader information economy; and second, that the concept of anti‐fragility helps explain how data increase in value through exposure to diverse contexts of use. We discuss the importance of these perspectives for the development of information systems capable of facilitating interdisciplinary scientific work, as well as the design of sustainable cyberinfrastructures.
In the realm of scholarly communication, scientific datasets are becoming more widely recognized for their scholarly and reuse value. However, given the investment toward maintaining and storing research data for long-term access, there is no clear strategy or metric for determining the reuse of research datasets. This study proposes a novel approach to track use and measure the impact of publically accessible datasets in scholarly publications through disciplinary reach-the number of unique journals and related subject categorizations in which articles are published. Using affiliated publication(s), described by the author as the works identified by the dataset creator or curator related to a dataset, the principles underlying the bibliometric technique of citation analysis are leveraged and applied. Preliminary results show that for earth science datasets, affiliated publications were primarily cited in physical science and multidisciplinary journals, indicating these datasets may have an impact on a number of different research areas. Continued refinement of these approaches, measures, and the design will serve to broaden our understanding of the reuse potential of scientific data and their influence on advancing scholarship.
Conceptual frameworks and taxonomies are an important part of the emerging base of knowledge on the curation of research data. We present the Data Practices and Curation Vocabulary (DPCVocab), a functional vocabulary created for specifying relationships among data practices in research, types of data produced and used, and curation roles and activities. The vocabulary consists of 3 categories-Research Data Practices, Data, and Curation-with 187 terms validated through empirical studies of scientific data practices in the Earth and life sciences. The present article covers the DPCVocab development process and examines applications for mapping relationships across the 3 categories, identifying factors for projecting curation costs and important differences in curation requirements across disciplines. As a tool for curators, the vocabulary provides a framework for charting curation options and guiding systematic administration of curation services. It can serve as a shared terminology or lingua franca to support interactions and collaboration among curators, data producers, system developers, and other stakeholders in data infrastructure and services. The DPCVocab as a whole supports both the technical and the human aspects of professional curation work essential to the modern research system.
Understanding the methods and processes implemented by data producers to generate research data is essential for fostering data reuse. Yet, producing the metadata that describes these methods remains a time-intensive activity that data producers do not readily undertake. In particular, researchers in the long tail of science often lack the financial support or tools for metadata generation, thereby limiting future access and reuse of data produced. The present study investigates research journal publications as a potential source for identifying descriptive metadata about methods for research data. Initial results indicate that journal articles provide rich descriptive content that can be sufficiently mapped to existing metadata standards with methods-related elements, resulting in a mapping of the data production process for a study. This research has implications for enhancing the generation of robust metadata to support the curation of research data for new inquiry and innovation.
. The structures of constructed MSTs are consistent with the sorting of SCI categories. The map of science is constructed based on our MST results. Such a map shows the relation among various knowledge clusters and their citation properties. The temporal evolution of the scientific world can also be delineated in the map. In particular, this map clearly shows a linear structure of the scientific world, which contains three major domains including physical sciences, life sciences, and medical sciences. The interaction of various knowledge fields can be clearly seen from this scientific world map. This approach can be applied to various levels of knowledge domains.
Metadata is an essential component to making datasets more accessible by others and facilitating meaningful interpretation and use. In particular, data users expect information on the methods or those processes undertaken to generate data to be available for review and data quality assessment. This study focuses on how description of methods is supported by examining current metadata schemes for scientific research data. Analysis of these schemes indicates varying degrees of support and guidance for methods description but with the potential for more comprehensive elements on documenting the data production process to be integrated and adopted. This preliminary investigation concludes with next steps toward a better understanding of metadata needs for research data and curation support of long‐term data sharing and reuse.
We present a general conceptual framework that maps relationships and dependencies among scientific data practices, types of data produced and used, and associated curation activities. As part of the Data Conservancy initiative, the framework is being elaborated through empirical studies of data practices in the earth sciences and life science and validated against use cases as curatorial services are developed around data being prepared for ingest into the repository. The framework can be applied more broadly for identifying and representing curation requirements and to support description and assessment of existing or planned curation infrastructure and services. It will support full accounts of the data products and workflows required to maintain the coherence and context of complex data collections.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.