Results are presented from the Data Curation Profiles project research, on who is willing to share what data with whom and when. Emerging from scientists' discussions on sharing are several dimensions suggestive of the variation in both what it means 'to share' and how these processes are carried out. This research indicates that data curation services will need to accommodate a wide range of subdisciplinary data characteristics and sharing practices. As part of a larger set of strategies emerging across academic institutions, institutional repositories (IRs) will contribute to the stewardship and mobilization of scientific research data for e-Research and learning. There will be particular types of data that can be managed well in an IR context when characteristics and practices are well understood. Findings from this study elucidate scientists' views on 'sharable' forms of data-the particular representation that they view as most valued for reuse by others within their own research areas-and the anticipated duration for such reuse. Reported sharing incidents that provide insights into barriers to sharing and related concerns on data misuse are included.
The launch of the US BRAIN and European Human Brain Projects coincides with growing international efforts toward transparency and increased access to publicly funded research in the neurosciences. The need for data-sharing standards and neuroinformatics infrastructure is more pressing than ever. However, ‘big science’ efforts are not the only drivers of data-sharing needs, as neuroscientists across the full spectrum of research grapple with the overwhelming volume of data being generated daily and a scientific environment that is increasingly focused on collaboration. In this commentary, we consider the issue of sharing of the richly diverse and heterogeneous small data sets produced by individual neuroscientists, so-called long-tail data. We consider the utility of these data, the diversity of repositories and options available for sharing such data, and emerging best practices. We provide use cases in which aggregating and mining diverse long-tail data convert numerous small data sources into big data for improved knowledge about neuroscience-related disorders.
This paper presents a brief literature review and then introduces the methods, design, and construction of the Data Curation Profile, an instrument that can be used to provide detailed information on particular data forms that might be curated by an academic library. These data forms are presented in the context of the related sub-disciplinary research area, and they provide the flow of the research process from which these data are generated. The profiles also represent the needs for data curation from the perspective of the data producers, using their own language. As such, they support the exploration of data curation across different research domains in real and practical terms. With the sponsorship of the Institute of Museum and Library Services, investigators from Purdue University and the University of Illinois interviewed 19 faculty subjects to identify needs for discovery, access, preservation, and reuse of their research data. For each subject, a profile was constructed that includes information about his or her general research, data forms and stages, value of data, data ingest, intellectual property, organization and description of data, tools, interoperability, impact and prestige, data management, and preservation. Each profile also presents a specific dataset supplied by the subject to serve as a concrete example. The Data Curation Profiles are being published to a public wiki for questions and discussion, and a blank template will be disseminated with guidelines for others to create and share their own profiles. This study was conducted primarily from the viewpoint of librarians interacting with faculty researchers; however, it is expected that these findings will complement a wide variety of data curation research and practice outside of librarianship and the university environment.
As the basic sciences become increasingly information-intensive, the management and use of research data presents new challenges in the collective activities that constitute scholarly and scientific communication. This also presents new opportunities for understanding the role of informatics in scientific work practices, and for designing new kinds tools and resources needed to support them. These issues of data management, scientific communication and collective activity are brought together at once in scientific data collections (SDCs). What can the development and use of shared SDCs tell us about collective activity, dynamic infrastructures, and distributed scientific work? Using examples drawn from a nascent neuroscience data collection, we examine some unique features of SDCs to illustrate that they do more than act as infrastructures for scientific research. Instead, we argue that they are themselves instantiations of Distributed Collective Practice (DCP), and as such illustrate concepts of transition, emergence, and interdependency that may not be so apparent in other kinds of DCPs. We propose that research into SDCs can yield new insights into institutional arrangements, policymaking, and authority structures in other very large-scale socio-technical networks.
While problems related to the curation and preservation of scientific data are receiving considerable attention from the information science and digital repository communities, relatively little progress has been made on approaches for evaluating the value of data to inform investment in acquisition, curation, and preservation. Adapting Hjørland's concept of the "epistemological potential" of documents, we assert that analytic potential, or the value of data for analysis beyond its original use, should guide development of data collections for repositories aimed at supporting research. Three key aspects of the analytic potential of data are identified and discussed: preservation readiness, potential user communities, and fit for purpose. Based on evidence from research from the Data Conservancy initiative, we demonstrate how the analytic potential of data can be determined and applied to build large-scale data collections suited for grand challenge science.
Conceptual frameworks and taxonomies are an important part of the emerging base of knowledge on the curation of research data. We present the Data Practices and Curation Vocabulary (DPCVocab), a functional vocabulary created for specifying relationships among data practices in research, types of data produced and used, and curation roles and activities. The vocabulary consists of 3 categories-Research Data Practices, Data, and Curation-with 187 terms validated through empirical studies of scientific data practices in the Earth and life sciences. The present article covers the DPCVocab development process and examines applications for mapping relationships across the 3 categories, identifying factors for projecting curation costs and important differences in curation requirements across disciplines. As a tool for curators, the vocabulary provides a framework for charting curation options and guiding systematic administration of curation services. It can serve as a shared terminology or lingua franca to support interactions and collaboration among curators, data producers, system developers, and other stakeholders in data infrastructure and services. The DPCVocab as a whole supports both the technical and the human aspects of professional curation work essential to the modern research system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.