The U.S. Department of Energy's Watershed Function Scientific Focus Area (SFA), centered in the East River, Colorado, generates diverse datasets including hydrological, geological, geochemical, geophysical, ecological, microbiological and remote sensing data. The project has deployed extensive field infrastructure involving hundreds of sensors that measure highly diverse phenomena (e.g. stream and groundwater hydrology, water quality, soil moisture, weather) across the watershed. Data from the sensor network are telemetered and automatically ingested into a queryable database. The data are subsequently quality checked, integrated with the United States Geological Survey's stream monitoring network using a custom data integration broker, and published to a portal with interactive visualizations. The resulting data products are used in a variety of scientific modeling and analytical efforts. This paper describes the SFA's end-to-end infrastructure and services that support the generation of integrated datasets from a watershed sensor network. The development and maintenance of this infrastructure, presents a suite of challenges from practical field logistics to complex data processing, which are addressed through various solutions. In particular, the SFA adopts a holistic view for data collection, assessment and integration, which dramatically improves the products generated, and enables a co-design approach wherein data collection is informed by model results and vice-versa.
Physical samples are foundational entities for research across biological, Earth, and environmental sciences. Data generated from sample-based analyses are not only the basis of individual studies, but can also be integrated with other data to answer new and broader-scale questions. Ecosystem studies increasingly rely on multidisciplinary team-science to study climate and environmental changes. While there are widely adopted conventions within certain domains to describe sample data, these have gaps when applied in a multidisciplinary context. In this study, we reviewed existing practices for identifying, characterizing, and linking related environmental samples. We then tested practicalities of assigning persistent identifiers to samples, with standardized metadata, in a pilot field test involving eight United States Department of Energy projects. Participants collected a variety of sample types, with analyses conducted across multiple facilities. We address terminology gaps for multidisciplinary research and make recommendations for assigning identifiers and metadata that supports sample tracking, integration, and reuse. Our goal is to provide a practical approach to sample management, geared towards ecosystem scientists who contribute and reuse sample data.
Developing data standards on Version Control System platforms like GitHub enables collaboration and transparency.• Many standards do not use tools for collaboration: issue tracking, licensing, and automated website hosting (GitBook or GitHub Pages).• We make recommendations and provide templates for creating descriptive versioncontrolled data standard documentation on GitHub.
Research can be more transparent and collaborative by using Findable, Accessible, Interoperable, and Reusable (FAIR) principles to publish Earth and environmental science data. Reporting formats—instructions, templates, and tools for consistently formatting data within a discipline—can help make data more accessible and reusable. However, the immense diversity of data types across Earth science disciplines makes development and adoption challenging. Here, we describe 11 community reporting formats for a diverse set of Earth science (meta)data including cross-domain metadata (dataset metadata, location metadata, sample metadata), file-formatting guidelines (file-level metadata, CSV files, terrestrial model data archiving), and domain-specific reporting formats for some biological, geochemical, and hydrological data (amplicon abundance tables, leaf-level gas exchange, soil respiration, water and sediment chemistry, sensor-based hydrologic measurements). More broadly, we provide guidelines that communities can use to create new (meta)data formats that integrate with their scientific workflows. Such reporting formats have the potential to accelerate scientific discovery and predictions by making it easier for data contributors to provide (meta)data that are more interoperable and reusable.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.