In a distributed storage system, code symbols are dispersed across space in nodes or storage units as opposed to time. In settings such as that of a large data center, an important consideration is the efficient repair of a failed node. Efficient repair calls for erasure codes that in the face of node failure, are efficient in terms of minimizing the amount of repair data transferred over the network, the amount of data accessed at a helper node as well as the number of helper nodes contacted. Coding theory has evolved to handle these challenges by introducing two new classes of erasure codes, namely regenerating codes and locally recoverable codes as well as by coming up with novel ways to repair the ubiquitous Reed-Solomon code. This survey provides an overview of the efforts in this direction that have taken place over the past decade. I. INTRODUCTIONThis survey article deals with the use of erasure coding for the reliable and efficient storage of large amounts of data in settings such as that of a data center. The amount of data stored in a single data center can run into tens or hundreds of petabytes. Reliability of data storage is ensured in part by introducing redundancy in some form, ranging from simple replication to the use of more sophisticated erasure-coding schemes such as Reed-Solomon codes. Minimizing the storage overhead that comes with ensuring reliability is a key consideration in the choice of erasure-coding scheme. More recently a second problem has surfaced, namely, that of node repair.In [1], [2] the authors study the Facebook warehouse cluster and analyze the frequency of node failures as well as the resultant network traffic relating to node repair. It was observed in [1] that a median of 50 nodes are unavailable per day and that a median of 180TB of cross-rack traffic is generated as a result of node unavailability. It was also reported that 98.08% of the cases have exactly one block missing in a stripe. The erasure code that was deployed in this instance was an [n = 14, k = 10] Reed Solomon (RS) code. Here n denotes the block length of the code and k the dimension. The conventional repair of an [n, k] RS code is inefficient in that the repair of a single node, calls for contacting k other (helper) nodes and downloading k times the amount of data stored in the failed node, which is clearly inefficient. Thus there is significant practical interest in the design of erasure-coding techniques that offer both low overhead and which can also be repaired efficiently.Coding theorists have responded to this need by coming up with two new classes of codes, namely ReGenerating (RG) and Locally Recoverable (LR) codes. The focus in a RG code is on minimizing the amount of data download needed to repair a failed node, termed the repair bandwidth while LR codes seek to minimize the number of helper nodes contacted for node repair, termed the repair degree. In a different direction, coding theorists have also re-examined the problem of node repair in RS codes and have come up with new and more efficient ...
A new class of exact-repair regenerating codes is constructed by stitching together shorter erasure correction codes, where the stitching pattern can be viewed as block designs. The proposed codes have the help-by-transfer property where the helper nodes simply transfer part of the stored data directly, without performing any computation. This embedded error correction structure makes the decoding process straightforward, and in some cases the complexity is very low. We show that this construction is able to achieve performance better than spacesharing between the minimum storage regenerating codes and the minimum repair-bandwidth regenerating codes, and it is the first class of codes to achieve this performance. In fact, it is shown that the proposed construction can achieve a nontrivial point on the optimal functional-repair tradeoff, and it is asymptotically optimal at high rate, i.e., it asymptotically approaches the minimum storage and the minimum repair-bandwidth simultaneously.
This paper presents an explicit construction for an ((n = 2qt, k = 2q(t−1), d = n−(q+1)), (α = q(2q) t−1 , β = α q )) regenerating code over a field F Q operating at the Minimum Storage Regeneration (MSR) point. The MSR code can be constructed to have rate k/n as close to 1 as desired, sub-packetization level α ≤ r n r for r = (n − k), field size Q no larger than n and where all code symbols can be repaired with the same minimum data download. This is the first-known construction of such an MSR code for d < (n − 1).
We present a high-rate (n, k, d = n − 1)-MSR code with a sub-packetization level that is polynomial in the dimension k of the code. While polynomial sub-packetization level was achieved earlier for vector MDS codes that repair systematic nodes optimally, no such MSR code construction is known. In the low-rate regime (i. e., rates less than one-half), MSR code constructions with a linear sub-packetization level are available. But in the high-rate regime (i. e., rates greater than one-half), the known MSR code constructions required a sub-packetization level that is exponential in k. In the present paper, we construct an MSR code for d = n − 1 with a fixed rate R = t−1 t , t ≥ 2, achieveing a sub-packetization level α = O(k t ). The code allows help-by-transfer repair, i. e., no computations are needed at the helper nodes during repair of a failed node.
In this paper, we study the notion of codes with hierarchical locality that is identified as another approach to local recovery from multiple erasures. The well-known class of codes with locality is said to possess hierarchical locality with a single level. In a code with two-level hierarchical locality, every symbol is protected by an inner-most local code, and another middle-level code of larger dimension containing the local code. We first consider codes with two levels of hierarchical locality, derive an upper bound on the minimum distance, and provide optimal code constructions of low field-size under certain parameter sets. Subsequently, we generalize both the bound and the constructions to hierarchical locality of arbitrary levels.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.