Erasure correcting codes are widely used to ensure data persistence in distributed storage systems. This paper addresses the simultaneous repair of multiple failures in such codes. We go beyond existing work (i.e., regenerating codes by Dimakis et al.) by describing (i) coordinated regenerating codes (also known as cooperative regenerating codes) which support the simultaneous repair of multiple devices, and (ii) adaptive regenerating codes which allow adapting the parameters at each repair. Similarly to regenerating codes by Dimakis et al., these codes achieve the optimal tradeoff between storage and the repair bandwidth. Based on these extended regenerating codes, we study the impact of lazy repairs applied to regenerating codes and conclude that lazy repairs cannot reduce the costs in term of network bandwidth but allow reducing the disk-related costs (disk bandwidth and disk I/O).
We study the exact and optimal repair of multiple failures in codes for distributed storage. More particularly, we examine the use of interference alignment to build exact scalar minimum storage coordinated regenerating codes (MSCR). We show that it is possible to build codes for the case of k = 2 and d ≥ k by aligning interferences independently but that this technique cannot be applied as soon as k ≥ 3 and d > k. Our results also apply to adaptive regenerating codes.
International audienceThe explosion of the amount of data stored in cloud systems calls for more efficient paradigms for redundancy. While replication is widely used to ensure data availability, erasure correcting codes provide a much better trade-off between storage and availability. Regenerating codes are good candidates for they also offer low repair costs in term of network bandwidth. While they have been proven optimal, they are difficult to understand and parameterize. In this paper we provide an analysis of regenerating codes for practitioners to grasp the various trade-offs. More specifically we make two contributions: (i) we study the impact of the parameters by conducting an analysis at the level of the system, rather than at the level of a single device; (ii) we compare the computational costs of various implementations of codes and highlight the most efficient ones. Our goal is to provide system designers with concrete information to help them choose the best parameters and design for regenerating codes
International audienceNetwork coding has been successfully applied in large-scale content dissemination systems. While network codes provide optimal throughput, its current forms suffer from a high decoding complexity. This is an issue when applied to systems composed of nodes with low processing capabilities, such as sensor networks. In this paper, we propose a novel network coding approach based on LT codes, initially introduced in the context of erasure coding. Our coding scheme, called LTNC, fully benefits from the low complexity of belief propagation decoding. Yet, such decoding schemes are extremely sensitive to statistical properties of the code. Maintaining such properties in a fully decentralized way with only a subset of encoded data is challenging. This is precisely what the recoding algorithms of LTNC achieve. We evaluate LTNC against random linear network codes in an epidemic content-dissemination application. Results show that LTNC increases communication overhead (20%) and convergence time (30%) but greatly reduces the decoding complexity (99%) when compared to random linear network codes. In addition, LTNC consistently outperforms dissemination protocols without codes, thus preserving the benefit of coding
E cient Nearest Neighbor (NN) search in high-dimensional spaces is a foundation of many multimedia retrieval systems. Because it o ers low responses times, Product Quantization (PQ) is a popular solution. PQ compresses high-dimensional vectors into short codes using several sub-quantizers, which enables in-RAM storage of large databases. This allows fast answers to NN queries, without accessing the SSD or HDD. The key feature of PQ is that it can compute distances between short codes and high-dimensional vectors using cache-resident lookup tables. The e ciency of this technique, named Asymmetric Distance Computation (ADC), remains limited because it performs many cache accesses.In this paper, we introduce Quick ADC, a novel technique that achieves a 3 to 6 times speedup over ADC by exploiting Single Instruction Multiple Data (SIMD) units available in current CPUs. Efciently exploiting SIMD requires algorithmic changes to the ADC procedure. Namely, Quick ADC relies on two key modi cations of ADC: (i) the use 4-bit sub-quantizers instead of the standard 8-bit sub-quantizers and (ii) the quantization of oating-point distances. This allows Quick ADC to exceed the performance of state-of-theart systems, e.g., it achieves a Recall@100 of 0.94 in 3.4 ms on 1 billion SIFT descriptors (128-bit codes).
No abstract
Efficient peer-to-peer backup services through buffering at the edge Abstract-The availability of end devices of peer-to-peer storage and backup systems has been shown critical for usability and for system reliability in practice. This has led to the adoption of hybrid architectures composed of both peers and servers. Such architectures mask the instability of peers thus approaching the performances of client-server systems while providing scalability at a low cost. In this paper, we advocate the replacement of such servers by a cloud of residential gateways, as they are already present in users' homes, thus pushing the required stable components at the edge of the network. In our gateway-assisted system, gateways act as buffers between peers, compensating for their intrinsic instability. This enables to offload backup tasks quickly from the user's machine to the gateway, while significantly lowering the retrieval time of backed up data. We evaluate our proposal using real world traces including existing traces from Skype and Jabber as well as a trace of residential gateways for availability, and a residential broadband trace for bandwidth. Results show that the time required to backup data in the network is comparable to a server-assisted approach, while substantially improving the time to restore data, which drops from a few days to a few hours. As gateways are becoming increasingly powerful in order to enable new services, we expect such a proposal to be leveraged on a short term basis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.