Data grows at the impressive rate of 50% per year, and 75% of the digital world is a copy 1 ! Although keeping multiple copies of data is necessary to guarantee their availability and long term durability, in many situations the amount of data redundancy is immoderate. By keeping a single copy of repeated data, data deduplication is considered as one of the most promising solutions to reduce the storage costs, and improve users experience by saving network bandwidth and reducing backup time. However, this solution must now solve many security issues to be completely satisfying. In this paper we target the attacks from malicious clients that are based on the manipulation of data identifiers and those based on backup time and network traffic observation. We present a deduplication scheme mixing an intraand an inter-user deduplication in order to build a storage system that is secure against the aforementioned type of attacks by controlling the correspondence between files and their identifiers, and making the inter-user deduplication unnoticeable to clients using deduplication proxies. Our method provides global storage space savings, per-client bandwidth network savings between clients and deduplication proxies, and global network bandwidth savings between deduplication proxies and the storage server. The evaluation of our solution compared to a classic system shows that the overhead introduced by our scheme is mostly due to data encryption which is necessary to ensure confidentiality.
To cite this version:Abstract-Internet Service Providers (ISP) furnishing cloud storage services usually rely on big data centers. These centralized architectures induce many drawbacks in terms of scalability and reliability as datacenters represent single points of failure, and in terms of data access latencies as they are not necessarily located close to the users. This paper presents the design choices about a distributed storage system that targets these issues by leveraging only high available nodes in the Digital Subscriber Line (xDSL) infrastructure of an ISP, namely a large number of home gateways, Points of Presence, and datacenters.
International audience—Internet Service Providers furnishing cloud storage services usually rely on big data centers. These centralized architectures induce many drawbacks in terms of scalability, reliability, and high access latency as data centers are single points of failure and are not necessarily located close to the users. This paper introduces Mistore, a distributed storage system aiming at guaranteeing data availability, durability, low access latency by leveraging the Digital Subscriber Line infrastructure of an ISP. Mistore uses the available storage resources of a large number of home gateways and points of presence for content storage and caching facilities reducing the role of the data center to a load balancer. Mistore also targets data consistency by providing multiple types of consistency criteria on content and a versioning system allowing users to get access to any prior versions of their contents. Mistore validation has been achieved through extensive simulation
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.