Grid environments more and more target novel domains such as eScience, eHealth or digital libraries that feature a variety of data-intensive applications. Consequently, issues related to data management in Grids are becoming increasingly important. In terms of data management, the Grid allows keeping a large number of replicas of data objects, possibly with different versions or levels of freshness, to allow for a high degree of availability, reliability and performance so as to best meet the needs of users and applications. At the same time, the seamless integration of replication management into the Grid while taking into account its special characteristics, needs to be done without any central component for managing data or metadata. In this paper, we report on the ongoing Re:GRIDiT project which aims at addressing all the above requirements. Re:GRIDiT distinguishes between potentially many updateable and read-only replicas which can be distributed across a Grid environment. First, Re:GRIDiT provides new protocols for the correct synchronization of concurrent updates to different updateable replicas and their subsequent propagation in a completely distributed way. Second, Re:GRIDiT takes into account the semantics of the data which is managed in the Grid: mutable data can be subject to updates; immutable data, in turn, cannot be changed once created, but may be subject to version control. Third, Re:GRIDiT will be dynamic in a way that according to the current load, new replicas (updateable or read-only) can be created or removed on demand. Fourth, Re:GRIDiT will provide read-only transactions the full flexibility to specify the freshness (for mutable data) or version number (for immutable data) -which is particularly useful * This work has been partly supported by the Hasler Foundation within the project COSA (Compiling Service-oriented Architectures).
Cloud computing has recently received considerable attention both in industry and academia. Due to the great success of the first generation of Cloud-based services, providers have to deal with larger and larger volumes of data. Quality of service agreements with customers require data to be replicated across data centers in order to guarantee a high degree of availability. In this context, Cloud Data Management has to address several challenges, especially when replicated data are concurrently updated at different sites or when the system workload and the resources requested by clients change dynamically. Mostly independent from recent developments in Cloud Data Management, Data Grids have undergone a transition from pure file management with readonly access to more powerful systems. In our recent work, we have developed the Re:GRIDiT protocol for managing data in the Grid which provides concurrent access to replicated data at different sites without any global component and supports the dynamic deployment of replicas. Since it is independent from the underlying Grid middleware, it can be seamlessly transferred to other environments like the Cloud. In this paper, we compare Data Management in the Grid and the Cloud, briefly introduce the Re:GRIDiT protocol and show its applicability for Cloud Data Management.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.