This paper presents first steps towards implementing a data layer to support a semi-automated preservation management system for research data in the arts and humanities. We suggest to use e-Science technology and grid middleware to implement a virtualised storage system for research data. We will outline how iRODS (Rule-Oriented Data management System) can be used within an architecture to implement complex, automated, scalable digital preservation strategies.
This paper presents the application profile for machine-actionable data management plans that allows information from traditional data management plans to be expressed in a machine-actionable way. We describe the methodology and research conducted to define the application profile. We also discuss design decisions made during its development and present systems which have adopted it. The application profile was developed in an open and consensus-driven manner within the DMP Common Standards Working Group of the Research Data Alliance and is its official recommendation.
Abstract. The iRODS system, created by the San Diego Supercomputing Centre, is a rule oriented data management system that allows the user to create sets of rules to define how the data is to be managed. Each rule corresponds to a particular action or operation (such as checksumming a file) and the system is flexible enough to allow the user to create new rules for new types of operations. The iRODS system can interface to any storage system (provided an iRODS driver is built for that system) and relies on its' metadata catalogue to provide a virtual file-system that can handle files of any size and type.However, some storage systems (such as tape systems) do not handle small files efficiently and prefer small files to be packaged up (or "bundled") into larger units. We have developed a system that can bundle small data files of any type into larger units -mounted collections. The system can create collection families and contains its' own extensible metadata, including metadata on which family the collection belongs to. The mounted collection system can work standalone and is being incorporated into the iRODS system to enhance the systems flexibility to handle small files.In this paper we describe the motivation for creating a mounted collection system, its' architecture and how it has been incorporated into the iRODS system. We describe different technologies used to create the mounted collection system and provide some performance numbers.
This paper describes a method for reliably managing files distributed in different kinds of Data Grids with RNS (Resource Namespace Service). RNS provides hierarchical namespace management for name-to-resource mapping as a key technology when using Grid resources for different kinds of middleware. We define attribute expressions in XML for the RNS entries and give algorithms to access distributed files stored within different kinds of Data Grids.The volume of digital data and the size of an individual file are increasing due to the introduction of high-resolution images, high-definition audiovisual files, etc. The reliable storage of such large files is becoming problematic with whole file replication as a failure in the integrity of the file is difficult to localise. Our method involves managing large files in Data Grids by splitting them into smaller units in a traceable manner and then managing the smaller units. The RNS catalog service contains EPR (Endpoint Reference) and metadata that describe the original locations as well as the checksum values. The example in this paper shows how our Grid application can retrieve the actual file locations and the checksum values from the RNS service via SAGA (A Simple API for Grid Applications). An application can access the distributed files as though they were files in the local file-system without worrying about the underlying Data Grids.This approach can be used with various Data Grid systems to enhance file reliability.
We present an evaluation of the European Data Grid software in the framework of the BaBar experiment. Two kinds of applications have been considered: first, a typical data analysis on real data producing physics n-tuples, and second, a distributed Monte-Carlo production on a computational grid. Both applications will be crucial in a near future in order to make an optimal use of the distributed computing resources available throughout the collaboration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.