This short paper outlines the key components of the NERC DataGrid: a discovery service, a vocabulary service and a software stack deployed both centrally to provide a data discovery portal, and at data providers to provide local portals and data and metadata services.
Abstract.The Climate Science Modelling Language (CSML) has been developed by the NERC DataGrid (NDG) project as a standards-based data model and XML markup for describing and constructing climate science datasets. It uses conceptual models from emerging standards in GIS to define a number of feature types, and adopts schemas of the Geography Markup Language (GML) where possible for encoding.A prototype deployment of CSML is being trialled across the curated archives of the British Atmospheric and Oceanographic Data Centres. These data include a wide range of data types -both observational and model -and heterogeneous file-based storage systems.CSML provides a semantic abstraction layer for data files, and is exposed through higher level data delivery services. In NDG these will include file instantiation services (for formats of choice) and the web services of the Open Geospatial Consortium (OGC).
Data services for the Grid have focussed so far primarily on virtualising access to distributed databases, and encapsulating file location. However, orchestration of services requires richer information semantics than these mechanisms provide. Service inputs and outputs must be semantically matched, or characterised in order that sensible transformations may be performed. In many domains important information structures must be aggregated across multiple files, and numerous legacy file formats obscure the natural logical structure of information types. We present a solution for constructing semantic data services for an earth-sciences data Grid (the UK NERC DataGrid). A semantically-rich data model is developed, drawing on components from external ontologies. A 'storage descriptor' provides the mechanism for mapping legacy file-based storage onto data model instances. Finally, data services may be built on top of the data model to expose a semantic view of the data irrespective of the underlying file storage details. Our approach is similar to wrapper/mediator architectures for integrating database management systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.