Providing homogeneous access ('services') to heterogeneous environmental data distributed across heterogeneous computing systems on a wide area network requires a robust information paradigm that can mediate between differing storage and information formats. While there are a number of ISO standards that provide some guidance on how to do this, the information landscape within domains is not well described. In this paper, we present an information taxonomy and two information components, which have been built for a specific application. These two components, one to aid data understanding and the other to aid data manipulation, are both deployed in the UK NERC DataGrid as described elsewhere.
This short paper outlines the key components of the NERC DataGrid: a discovery service, a vocabulary service and a software stack deployed both centrally to provide a data discovery portal, and at data providers to provide local portals and data and metadata services.
The problem of sharing environmental and climate data (from measurements or models) across networks does not yet have a standard solution. Ad hoc approaches are common, with complications arising through a number of different file formats and conventions. The state of the art in the climate research community is the Distributed Oceanographic Data System (DODS, also “OPeNDAP”), which maps a file or aggregation of files onto a URL. Data subsets may be retrieved, and limited (and non-standard) metadata queries made. Data are abstracted from storage only to a limited degree. We present an alternate data access mechanism, the Grid Access Data Service (GADS), better suited to Grid applications. Requirements of data abstracted from storage, rich metadata models, flexible delivery, security, and orchestrated workflows all suggest a Web Service solution. The GADS Web Service has two operations: a querying mechanism, and a request mechanism. Three metadata models are required: one for the abstract representation of climate data, one for characterizing data usage, and one for describing data storage artifacts. A compatibility interface has been layered on the Web Service to provide DODS/OPeNDAP functionality. A visualization Web portal has also been built to interface with GADS to demonstrate extendable functionality.
Traditionally, the formal scientific output in most fields of natural science has been limited to peer-reviewed academic journal publications, with less attention paid to the chain of intermediate data results and their associated metadata, including provenance. In effect, this has constrained the representation and verification of the data provenance to the confines of the related publications. Detailed knowledge of a dataset’s provenance is essential to establish the pedigree of the data for its effective re-use, and to avoid redundant re-enactment of the experiment or computation involved. It is increasingly important for open-access data to determine their authenticity and quality, especially considering the growing volumes of datasets appearing in the public domain. To address these issues, we present an approach that combines the Digital Object Identifier (DOI) – a widely adopted citation technique – with existing, widely adopted climate science data standards to formally publish detailed provenance of a climate research dataset as an associated scientific workflow. This is integrated with linked-data compliant data re-use standards (e.g. OAI-ORE) to enable a seamless link between a publication and the complete trail of lineage of the corresponding dataset, including the dataset itself.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.