<p>The Thematic Real-Time Environmental Distributed Data Services (THREDDS) Data Server is an open-source, Java-based web application that enables metadata and data access to scientific netCDF datasets. In recent years, more and more research institutes implemented THREDDS to give researchers and other end-users access to a wide range of real-time and archival data sets from earth system sciences.&#160;</p><p>A number of features and interfaces are provided by THREDDS that facilitate the interactive and automated exploration, standardization and use of data like the automated generation of ISO-formatted metadata files or the provision of OGC-services (WMS and WCS). However, the configuration of THREDDS via XML-catalogs remains difficult and is usually restricted to system admins. And particularly the publication and consistent maintenance of a large number of datasets is prone to errors and hence proves to be difficult and time-consuming.&#160;</p><p>Within the Model Data Explorer (MDE, https://model-data-explorer.readthedocs.io), a cross-institutional project to simplify a FAIR publication of model data on the web, we develop a module to overcome these configuration issues and enable scientists to make their environmental research data available on the web. This MDE-THREDDS module manages the catalogs and configurations of the THREDDS data server by providing a user-friendly web-interface for handling major components of THREDDS, including catalogs and web services. A flexible permission system enables scientists and other data producers to add and update their own datasets without the need for manually editing the underlying THREDDS catalogs. This permission system further allows server administrators to moderate and facilitate the publication of data on the web by scientists and other end-users which, hence, ensures a standardized and consistent THREDDS catalog infrastructure.</p><p>Overall, with MDE-THREDDS, we want to give scientists and other data producers a simple and user-friendly framework for making their research data open and FAIR through a wide range of standardized and well-established web interfaces.</p>
<p>In recent years, the requirements for data from earth system sciences have increased massively. Data from observation systems needs to be transferred into larger research data infrastructures, evaluated and flagged via well-defined quality checks, enriched with standardized metadata and finally made available to the public via standard interfaces. And in order to fulfill the FAIR principles, we have to ensure transparency and reproducibility of all these steps. Moreover, the rising demand of near-real-time (NRT) data requires the whole data pipeline to run operationally with minimal manual effort.</p><p>However, in many cases, there are still heterogenous data landscapes to be found without centralized control of data, data processing, version control and QA/QC. This is often aggravated by to inconvenient, outdated and isolated tools and software solutions.</p><p>Therefore, we develop and implement an adaptable automated pipeline, which combines the assurance of data consistency, QA/QC (Quality Assurance / Quality Control), graphically supported validation and unified persistence and publication of data. User friendliness is achieved by making the system configurable and trackable through lightweight user interfaces over the complete data lifecycle. By only using open-source software solutions and applying community standards for data formats and interfaces, a high level of sustainability and independence can be ensured.</p><p>In this presentation, we hence want to demonstrate such an end-to-end data pipeline that finally allows for the FAIRification of typical environmental sensor data.</p>
<p>A contemporary and flexible Research Data Management (RDM) framework is required to make environmental research data Findable, Accessible, Interoperable, and Reusable (FAIR) and, hence, provide the foundation for open and reproducible earth system sciences. While data-sets that accompany scientific articles are typically published via large data repositories like Pangaea or Zenodo, intermediate, day-to-day, or actively-used data (e.g., data from research projects or prototypical data) is still exchanged via simple cloud storage services and email. And while the FAIR principles require data to be openly findable and accessible, it is often only available within closed and restricted infrastructures and local file systems.</p><p>Our research project Cat4KIT hence aims to develop a cross-institutional catalog and RDM framework for the FAIRification of such day-to-day research data. This framework is comprised of four modules / services for</p><ul><li> <p>providing access to data on storage systems through well-defined and standardized interfaces&#160;</p> </li> <li> <p>harvesting and transforming (meta)data into standardized formats</p> </li> <li> <p>making (meta)data accessible to the public using well-defined and standardized catalog services and interfaces</p> </li> <li> <p>enabling users to search, filter, and explore data from decentralized research data infrastructures.</p> </li> </ul><p>We develop, implement and evaluate each of these four modules within an inter-institutional consortium consisting of scientists, software developers and potential end-users. This allows us to include a wide-range of research data from multi-dimensional climate model outputs to high-frequency in-situ measurements. We emphasize the application of existing open-source solutions and community standards for data interfaces (THREDDS, STA, S3), (meta)data schemes, and catalog services (Spatio-Temporal Assets Catalog - STAC) in order to ensure an easy integration of research data into the Cat4KIT-framework and a straightforward extension to further research data infrastructures.</p><p>In our presentation, we demonstrate the current status of our Cat4KIT-framework as an inter-institutional research data management and catalog platform for the FAIRification of day-to-day research data.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.