Abstract:Site-Based Data Curation (SBDC) is an approach to managing research data that prioritizes sharing and reuse of data collected at scientifically significant sites. The SBDC framework is based on geobiology research at natural hot spring sites in Yellowstone National Park as an exemplar case of high value field data in contemporary, cross-disciplinary earth systems science. Through stakeholder analysis and investigation of data artifacts, we determined that meaningful and valid reuse of digital hot spring data r… Show more
“…The students reported finding this prospective discussion of data collection methods and best practices helpful in their work, and many were successful at producing robust metadata in their field books that would be key for curating and sharing their data. The spreadsheet template, as well as an excerpt of one student's completed template are available in our Supplemental Materials (https://doi.org/10.6084/m9.figshare.5450809); this work is also discussed further by Palmer et al ().…”
Section: Discussionmentioning
confidence: 99%
“…To that end, we developed a method of Research Process Modeling. This approach draws on systems analysis and information modeling approaches, and is informed by both our prior work on this project (Palmer et al, ), and prior work on computational process curation (Goble et al, ) and workflow‐centric research objects (for example, Bechhofer et al, ; Belhajjame et al, ). The simple inventory described above became one of four components required to document the artifacts, processes, and relationships involved in the collection of physical samples and observational data.…”
Section: Methodsmentioning
confidence: 99%
“…Additionally, the inventory identifies the Minimum Information Framework (MIF) superclass and the Formats for each data artifact. The “MIF superclasses” are drawn from our prior work developing a high‐level information model for geobiology field data (Palmer et al, ). Classifying data artifacts according to MIF classes is helpful in supporting reuse of data for new purposes, and may aid access and retrieval functions as data collections are brought together in repositories over time.…”
Section: Methodsmentioning
confidence: 99%
“…Through the SBDC project, we sought to support the aggregation and integration of geobiology data within and across scientifically significant sites. In collaboration with geobiologists and National Park Service (NPS) personnel, we developed a Minimum Information Framework of key information classes that ought to be prioritized for collection and curation (Palmer et al, ). We additionally used the approaches described herein to identify optimal points of curatorial intervention in the research workflow; these are points at which data should be optimally documented and managed, thereby making field‐based processes retraceable, and the data collected reliably interpretable and reusable.…”
Section: The Case: Geobiology At Yellowstone National Parkmentioning
A comprehensive record of research data provenance is essential for the successful curation, management, and reuse of data over time. However, creating such detailed metadata can be onerous, and there are few structured methods for doing so. In this case study of data curation in support of geobiology research conducted at Yellowstone National Park, we describe a method of "Research Process Modeling" for documenting noncomputational data provenance in a structured yet flexible way. The method combines systems analysis techniques to model research activities, the World Wide Web Consortium Provenance (PROV) ontology to illustrate relationships between data products, and simple inventory methods to account for research processes and data products. It also supports collaborative data curation between information professionals and researchers, and is therefore a significant step toward producing more useable and interpretable research data. We demonstrate how this method describes data provenance more robustly than "flat" metadata alone and fills a critical gap in the documentation of provenance for field-based and noncomputational workflows. We discuss potential applications of this approach to other research domains.
“…The students reported finding this prospective discussion of data collection methods and best practices helpful in their work, and many were successful at producing robust metadata in their field books that would be key for curating and sharing their data. The spreadsheet template, as well as an excerpt of one student's completed template are available in our Supplemental Materials (https://doi.org/10.6084/m9.figshare.5450809); this work is also discussed further by Palmer et al ().…”
Section: Discussionmentioning
confidence: 99%
“…To that end, we developed a method of Research Process Modeling. This approach draws on systems analysis and information modeling approaches, and is informed by both our prior work on this project (Palmer et al, ), and prior work on computational process curation (Goble et al, ) and workflow‐centric research objects (for example, Bechhofer et al, ; Belhajjame et al, ). The simple inventory described above became one of four components required to document the artifacts, processes, and relationships involved in the collection of physical samples and observational data.…”
Section: Methodsmentioning
confidence: 99%
“…Additionally, the inventory identifies the Minimum Information Framework (MIF) superclass and the Formats for each data artifact. The “MIF superclasses” are drawn from our prior work developing a high‐level information model for geobiology field data (Palmer et al, ). Classifying data artifacts according to MIF classes is helpful in supporting reuse of data for new purposes, and may aid access and retrieval functions as data collections are brought together in repositories over time.…”
Section: Methodsmentioning
confidence: 99%
“…Through the SBDC project, we sought to support the aggregation and integration of geobiology data within and across scientifically significant sites. In collaboration with geobiologists and National Park Service (NPS) personnel, we developed a Minimum Information Framework of key information classes that ought to be prioritized for collection and curation (Palmer et al, ). We additionally used the approaches described herein to identify optimal points of curatorial intervention in the research workflow; these are points at which data should be optimally documented and managed, thereby making field‐based processes retraceable, and the data collected reliably interpretable and reusable.…”
Section: The Case: Geobiology At Yellowstone National Parkmentioning
A comprehensive record of research data provenance is essential for the successful curation, management, and reuse of data over time. However, creating such detailed metadata can be onerous, and there are few structured methods for doing so. In this case study of data curation in support of geobiology research conducted at Yellowstone National Park, we describe a method of "Research Process Modeling" for documenting noncomputational data provenance in a structured yet flexible way. The method combines systems analysis techniques to model research activities, the World Wide Web Consortium Provenance (PROV) ontology to illustrate relationships between data products, and simple inventory methods to account for research processes and data products. It also supports collaborative data curation between information professionals and researchers, and is therefore a significant step toward producing more useable and interpretable research data. We demonstrate how this method describes data provenance more robustly than "flat" metadata alone and fills a critical gap in the documentation of provenance for field-based and noncomputational workflows. We discuss potential applications of this approach to other research domains.
“…Detailed description of all protocols and techniques used for field collection, biomolecule extractions, and meta-omic analyses are presented in the Supplementary Information and briefly summarized here. Detailed descriptions of the experimental design and metadata curation strategies adopted for all aspects of the field and laboratory analyses in the present study are presented in the works of Palmer et al (2017) and Thomer et al (2018).…”
The evolutionarily ancient Aquificales bacterium Sulfurihydrogenibium spp. dominates filamentous microbial mat communities in shallow, fast-flowing, and dysoxic hot-spring drainage systems around the world. In the present study, field observations of these fettuccini-like microbial mats at Mammoth Hot Springs in Yellowstone National Park are integrated with geology, geochemistry, hydrology, microscopy, and multi-omic molecular biology analyses. Strategic sampling of living filamentous mats along with the hot-spring CaCO3 (travertine) in which they are actively being entombed and fossilized has permitted the first direct linkage of Sulfurihydrogenibium spp. physiology and metabolism with the formation of distinct travertine streamer microbial biomarkers. Results indicate that, during chemoautotrophy and CO2 carbon fixation, the 87–98% Sulfurihydrogenibium-dominated mats utilize chaperons to facilitate enzyme stability and function. High-abundance transcripts and proteins for type IV pili and extracellular polymeric substances (EPSs) are consistent with their strong mucus-rich filaments tens of centimeters long that withstand hydrodynamic shear as they become encrusted by more than 5 mm of travertine per day. Their primary energy source is the oxidation of reduced sulfur (e.g., sulfide, sulfur, or thiosulfate) and the simultaneous uptake of extremely low concentrations of dissolved O2 facilitated by bd-type cytochromes. The formation of elevated travertine ridges permits the Sulfurihydrogenibium-dominated mats to create a shallow platform from which to access low levels of dissolved oxygen at the virtual exclusion of other microorganisms. These ridged travertine streamer microbial biomarkers are well preserved and create a robust fossil record of microbial physiological and metabolic activities in modern and ancient hot-spring ecosystems.
Scientifically significant sites are the source of, and long-term repository for, considerable amounts of data-particularly in the natural sciences. However, the unique data practices of the researchers and resource managers at these sites have been relatively understudied. Through case studies of two scientifically significant sites (the hot springs at Yellowstone National Park and the fossil deposits at the La Brea Tar Pits), I developed rich descriptions of sitebased research and data curation, and high-level data models of information classes needed to support integrative data reuse. Each framework treats the geospatial site and its changing natural characteristics as a distinct class of information; more commonly considered information classes such as observational and sampling data, and project metadata, are defined in relation to the site itself. This work contributes (a) case studies of the values and data needs for researchers and resource managers at scientifically significant sites, (b) an information framework to support integrative reuse at these sites, and (c) a discussion of data practices at scientifically significant sites.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.