This short paper outlines the key components of the NERC DataGrid: a discovery service, a vocabulary service and a software stack deployed both centrally to provide a data discovery portal, and at data providers to provide local portals and data and metadata services.
Abstract.The Climate Science Modelling Language (CSML) has been developed by the NERC DataGrid (NDG) project as a standards-based data model and XML markup for describing and constructing climate science datasets. It uses conceptual models from emerging standards in GIS to define a number of feature types, and adopts schemas of the Geography Markup Language (GML) where possible for encoding.A prototype deployment of CSML is being trialled across the curated archives of the British Atmospheric and Oceanographic Data Centres. These data include a wide range of data types -both observational and model -and heterogeneous file-based storage systems.CSML provides a semantic abstraction layer for data files, and is exposed through higher level data delivery services. In NDG these will include file instantiation services (for formats of choice) and the web services of the Open Geospatial Consortium (OGC).
This paper provides an overview of the new Secure Web Interface for the Medical Research Council (MRC) National Survey of Health and Development (NSHD), which is a webbased data and metadata access system for the longest-running longitudinal study of health in the world [1]. Accessing NSHD metadata and data has often been a challenge for external (non-MRC) users, because the underlying data and metadata formats have changed dramatically in 63 years of operation, and because the processes involved in metadata search and data access were manual and usually required "on-site" access [2]. The design goals of the SWIFT include maintaining confidentiality and privacy of study members, enabling metadata search access to internal (MRC) and external users, facilitating data downloading and extraction from a range of underlying formats, and implementing procedures to ensure compliance with the governance policies of the MRC and the NSHD's governance panel. This paper details some of the challenges and successes of the MRC pathfinder "Data Access Project" to enable access using SWIFT while protecting the study members and their data.
Data services for the Grid have focussed so far primarily on virtualising access to distributed databases, and encapsulating file location. However, orchestration of services requires richer information semantics than these mechanisms provide. Service inputs and outputs must be semantically matched, or characterised in order that sensible transformations may be performed. In many domains important information structures must be aggregated across multiple files, and numerous legacy file formats obscure the natural logical structure of information types. We present a solution for constructing semantic data services for an earth-sciences data Grid (the UK NERC DataGrid). A semantically-rich data model is developed, drawing on components from external ontologies. A 'storage descriptor' provides the mechanism for mapping legacy file-based storage onto data model instances. Finally, data services may be built on top of the data model to expose a semantic view of the data irrespective of the underlying file storage details. Our approach is similar to wrapper/mediator architectures for integrating database management systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.