Motivated by applications to distributed storage, Gopalan et al recently introduced the interesting notion of information-symbol locality in a linear code. By this it is meant that each message symbol appears in a parity-check equation associated with small Hamming weight, thereby enabling recovery of the message symbol by examining a small number of other code symbols. This notion is expanded to the case when all code symbols, not just the message symbols, are covered by such "local" parity. In this paper, we extend the results of Gopalan et. al. so as to permit recovery of an erased code symbol even in the presence of errors in local parity symbols. We present tight bounds on the minimum distance of such codes and exhibit codes that are optimal with respect to the local error-correction property. As a corollary, we obtain an upper bound on the minimum distance of a concatenated code. ] linear code C over the field F q is said to have locality r if this symbol can be recovered by accessing at most r other code symbols of code C. Equivalently, for any coordinate i, there exists a row in the parity-check matrix of the code of Hamming weight at most r + 1, whose support includes i. An (r, d) code was defined as a systematic linear code C having minimum distance d, where all k message symbols have locality r. It was shown that the minimum distance of an (r, d) code is upper bounded by
In this paper, we study codes with locality that can recover from two erasures via a sequence of two local, parity-check computations. By a local parity-check computation, we mean recovery via a single parity-check equation associated to small Hamming weight. Earlier approaches considered recovery in parallel; the sequential approach allows us to potentially construct codes with improved minimum distance. These codes, which we refer to as locally 2-reconstructible codes, are a natural generalization along one direction, of codes with all-symbol locality introduced by Gopalan et al, in which recovery from a single erasure is considered. By studying the Generalized Hamming Weights of the dual code, we derive upper bounds on the minimum distance of locally 2-reconstructible codes and provide constructions for a family of codes based on Turán graphs, that are optimal with respect to this bound. The minimum distance bound derived here is universal in the sense that no code which permits all-symbol local recovery from 2 erasures can have larger minimum distance regardless of approach adopted. Our approach also leads to a new bound on the minimum distance of codes with all-symbol locality for the single-erasure case.
Regenerating codes and codes with locality are two schemes that have recently been proposed to ensure data collection and reliability in a distributed storage network. In a situation where one is attempting to repair a failed node, regenerating codes seek to minimize the amount of data downloaded for node repair, while codes with locality attempt to minimize the number of helper nodes accessed. In this paper, we provide several constructions for a class of vector codes with locality in which the local codes are regenerating codes, that enjoy both advantages. We derive an upper bound on the minimum distance of this class of codes and show that the proposed constructions achieve this bound. The constructions include both the cases where the local regenerating codes correspond to the MSR as well as the MBR point on the storage-repair-bandwidth tradeoff curve of regenerating codes.
Node failures are inevitable in distributed storage systems (DSS). To enable efficient repair when faced with such failures, two main techniques are known: Regenerating codes, i.e., codes that minimize the total repair bandwidth; and codes with locality, which minimize the number of nodes participating in the repair process. This paper focuses on regenerating codes with locality, using pre-coding based on Gabidulin codes, and presents constructions that utilize minimum bandwidth regenerating (MBR) local codes. The constructions achieve maximum resilience (i.e., optimal minimum distance) and have maximum capacity (i.e., maximum rate). Finally, the same pre-coding mechanism can be combined with a subclass of fractional-repetition codes to enable maximum resilience and repair-by-transfer simultaneously. I. BACKGROUND A. Vector CodesAn [n, K, d min , α] vector code over a field F q is a code C of block length n, having a symbol alphabet F α q for some α > 1, satisfying the additional property that given c, c ∈ C and a, b ∈ F q , ac + bc also belongs to C. As a vector space over F q , C has dimension K, termed the scalar dimension (equivalently, the file size) of the code and as a code over the alphabet F α q , the code has minimum distance d min . Associated with the vector code C is an F q -linear scalar code C (s) of length N = nα, where C (s) is obtained by expanding each vector symbol within a codeword into α scalar symbols (in some prescribed order). Given a generator matrix G for the scalar code C (s) , the first code symbol in the vector code is naturally associated with the first α columns of G etc. We will refer to the collection of α columns of G associated with the i th code symbol c i as the i th thick column and to avoid confusion, the columns of G themselves as thin columns. B. Locality in Vector CodesLet C be an [n, K, d min , α] vector code over a field F q , possessing a (K × nα) generator matrix G. The i th code symbol, c i , is said to have (r, δ) locality, δ ≥ 2, if there exists a punctured code C i := C| Si of C (called a local code) with support S i ⊆ {1, 2, · · · , n} such thatThe code C is said to have (r, δ) information locality if there exists l code symbols with (r, δ) locality and respective support setsThe code C is said to have (r, δ) all-symbol locality if all code symbols have (r, δ) locality. A code with (r, δ) information (respectively, all-symbol) locality is said to have full (r, δ) information (respectively, all-symbol) locality, if all local codes have parameters given by |S i | = r + δ − 1 and d min (C i ) = δ, for i = 1, · · · , l.The concept of locality for scalar codes, with δ = 2, was introduced in [1] and extended in [2] and [3] to scalar codes with arbitrary δ, and vector codes with δ = 2, respectively. This was further extended to vector codes with arbitrary δ in [4] and [5], where, in addition to constructions of vector codes with locality, authors derive minimum distance upper bounds and also consider settings in which the local codes have regeneration properti...
In this paper, we study codes with locality that can recover from two erasures via a sequence of two local, parity-check computations. By a local parity-check computation, we mean recovery via a single parity-check equation associated to small Hamming weight. Earlier approaches considered recovery in parallel; the sequential approach allows us to potentially construct codes with improved minimum distance. These codes, which we refer to as locally 2-reconstructible codes, are a natural generalization along one direction, of codes with all-symbol locality introduced by Gopalan et al, in which recovery from a single erasure is considered. By studying the Generalized Hamming Weights of the dual code, we derive upper bounds on the minimum distance of locally 2-reconstructible codes and provide constructions for a family of codes based on Turán graphs, that are optimal with respect to this bound. The minimum distance bound derived here is universal in the sense that no code which permits all-symbol local recovery from 2 erasures can have larger minimum distance regardless of approach adopted. Our approach also leads to a new bound on the minimum distance of codes with all-symbol locality for the single-erasure case.
Distributed databases often suffer unequal distribution of data among storage nodes, which is known as 'data skew'. Data skew arises from a number of causes such as removal of existing storage nodes and addition of new empty nodes to the database. Data skew leads to performance degradations and necessitates 'rebalancing' at regular intervals to reduce the amount of skew. We define an r-balanced distributed database as a distributed database in which the storage across the nodes has uniform size, and each bit of the data is replicated in r distinct storage nodes. We consider the problem of designing such balanced databases along with associated rebalancing schemes which maintain the r-balanced property under node removal and addition operations. We present a class of r-balanced databases (parameterized by the number of storage nodes) which have the property of structural invariance, i.e., the databases designed for different number of storage nodes have the same structure. For this class of r-balanced databases, we present rebalancing schemes which use coded transmissions between storage nodes, and characterize their communication loads under node addition and removal. We show that the communication cost incurred to rebalance our distributed database for node addition and removal is optimal, i.e., it achieves the minimum possible cost among all possible balanced distributed databases and rebalancing schemes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.