Results are presented from the Data Curation Profiles project research, on who is willing to share what data with whom and when. Emerging from scientists' discussions on sharing are several dimensions suggestive of the variation in both what it means 'to share' and how these processes are carried out. This research indicates that data curation services will need to accommodate a wide range of subdisciplinary data characteristics and sharing practices. As part of a larger set of strategies emerging across academic institutions, institutional repositories (IRs) will contribute to the stewardship and mobilization of scientific research data for e-Research and learning. There will be particular types of data that can be managed well in an IR context when characteristics and practices are well understood. Findings from this study elucidate scientists' views on 'sharable' forms of data-the particular representation that they view as most valued for reuse by others within their own research areas-and the anticipated duration for such reuse. Reported sharing incidents that provide insights into barriers to sharing and related concerns on data misuse are included.
This paper presents a brief literature review and then introduces the methods, design, and construction of the Data Curation Profile, an instrument that can be used to provide detailed information on particular data forms that might be curated by an academic library. These data forms are presented in the context of the related sub-disciplinary research area, and they provide the flow of the research process from which these data are generated. The profiles also represent the needs for data curation from the perspective of the data producers, using their own language. As such, they support the exploration of data curation across different research domains in real and practical terms. With the sponsorship of the Institute of Museum and Library Services, investigators from Purdue University and the University of Illinois interviewed 19 faculty subjects to identify needs for discovery, access, preservation, and reuse of their research data. For each subject, a profile was constructed that includes information about his or her general research, data forms and stages, value of data, data ingest, intellectual property, organization and description of data, tools, interoperability, impact and prestige, data management, and preservation. Each profile also presents a specific dataset supplied by the subject to serve as a concrete example. The Data Curation Profiles are being published to a public wiki for questions and discussion, and a blank template will be disseminated with guidelines for others to create and share their own profiles. This study was conducted primarily from the viewpoint of librarians interacting with faculty researchers; however, it is expected that these findings will complement a wide variety of data curation research and practice outside of librarianship and the university environment.
In January of 2011, the National Science Foundation began requiring that all proposals for research funding include data management plans. At the time of the mandate, Purdue University's libraries and campus information technology units had been collaborating on enhancements to the HUBzero virtual research environment. These efforts were parlayed into the development of an institutional, digital data repository and service with the support of the campus research office. In the process, local library science practices have been extended to facilitate research data curation and cyberinfrastructure on campus. Librarians are consulting on data management plans, conducting data reference and instruction, advising on data organization and description, and stewarding collections of data within an evolving library service framework.
Broadly speaking, the lack of a framework for organizing, preserving, and making research data available for the long term has resulted in valuable datasets becoming lost or discarded. The approach of the Distributed Data Curation Center of the Purdue University Libraries has been to integrate librarians and the principles of library and archival sciences with domain sciences, computer and information sciences, and information technology to address the challenges of managing collections of research data and to learn how to better support interdisciplinary research through data curation. One piece of infrastructure that supports these activities is a "distributed institutional repository" that includes electronic documents, digitized archival collections, and research datasets housed in multiple systems that are connected together using Web Services and other middleware. Concurrently, roles for librarians and institutional repositories in data curation are being explored. The History of the Well-Run Laboratory You can imagine a bygone time from the history of the well-run laboratory when scientists arrived for work in the morning, put on their lab coats, and checked their lab notebooks out of a locked cabinet. The notebooks were assigned to them individually and contained detailed descriptions of their experiments, parameters, annotations, and results in an orderly, structured format. At the end of the day, they signed and returned their notebooks to the cabinet. The notebooks were preserved in an archive as a part of the scientific record and the annals of the lab. R. A. Baker outlined the regimen for chemistry educators back in 1933:
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.