Abstract:Erasure codes are nowadays used extensively in\ud
distributed storage systems that handle big data, since they\ud
offer significant fault tolerance with low storage overhead. Even\ud
though erasure coded systems are space efficient, these involve\ud
higher network bandwidth and computational complexity in their\ud
operations. In this paper, we present RAPID, a protocol for fast\ud
data updates, which works by choosing a subset of code blocks\ud
for updates and adapts the strength of the subset based on the\ud
… Show more
“…As stated before, DU depends on encoding or its delta style. Several excellent techniques have been proposed in the literature, which significantly improve encoding efficiency, such as Bitmatrix Normalization (BN) [41], Smart Scheduling (SS) [41], Randomization [6], Interference Alignment [45], Update-Efficient Regenerating Codes (UERC) [26], Matching [27], RAPID [4]. While Vectorization [64] is a typical technique of accelerating the computation process.…”
Section: A Computation Optimizationmentioning
confidence: 99%
“…Eq. (11) shows an example of BDM with [7,4,3] Hamming code [6], which has update complexity of 4. That is, if any single block is changed, 4 blocks are required to be updated at most (e.g., if d 4 is changed, the corresponding p 1 , p 2 , p 3 are required to be updated).…”
Section: ) Randomizationmentioning
confidence: 99%
“…Another approach for optimizing DU is to reduce the total number of parity updates, which is called RAPID [4]. Akash et al recognized that it is not necessary to update parity nodes if no failures occur in this update round.…”
Erasure coding is the leading technique to achieve resilient redundancy in cloud storage systems. However, it introduces two prominent issues: data repair and data update. Compare to data repair, data update is much more common. A variety of update schemes based on erasure coding have been proposed in the literature to optimize data update, such as computation optimization, network traffic overhead reduction, IO overhead reduction, and modern hardware acceleration. However, all of these techniques were proposed individually previously. In this work, we seek to summarize them systematically and group them in a new form. First, we generalize the state-of-the-art researches and introduce existing classifications. Moreover, based on our observation, we propose two classifications: resource-based classification and tierbased classification. In resource-based classification, we group these techniques according to the resource they optimize and introduce them in detail. In tier-based classification, we propose a novel hybrid technique framework with five tiers and conduct a comprehensive comparison between these techniques. We make a conjecture that most techniques in different tiers can be used jointly. Finally, we conclude the research challenges and potential future works. INDEX TERMS data update, cloud storage, erasure coding, survey
“…As stated before, DU depends on encoding or its delta style. Several excellent techniques have been proposed in the literature, which significantly improve encoding efficiency, such as Bitmatrix Normalization (BN) [41], Smart Scheduling (SS) [41], Randomization [6], Interference Alignment [45], Update-Efficient Regenerating Codes (UERC) [26], Matching [27], RAPID [4]. While Vectorization [64] is a typical technique of accelerating the computation process.…”
Section: A Computation Optimizationmentioning
confidence: 99%
“…Eq. (11) shows an example of BDM with [7,4,3] Hamming code [6], which has update complexity of 4. That is, if any single block is changed, 4 blocks are required to be updated at most (e.g., if d 4 is changed, the corresponding p 1 , p 2 , p 3 are required to be updated).…”
Section: ) Randomizationmentioning
confidence: 99%
“…Another approach for optimizing DU is to reduce the total number of parity updates, which is called RAPID [4]. Akash et al recognized that it is not necessary to update parity nodes if no failures occur in this update round.…”
Erasure coding is the leading technique to achieve resilient redundancy in cloud storage systems. However, it introduces two prominent issues: data repair and data update. Compare to data repair, data update is much more common. A variety of update schemes based on erasure coding have been proposed in the literature to optimize data update, such as computation optimization, network traffic overhead reduction, IO overhead reduction, and modern hardware acceleration. However, all of these techniques were proposed individually previously. In this work, we seek to summarize them systematically and group them in a new form. First, we generalize the state-of-the-art researches and introduce existing classifications. Moreover, based on our observation, we propose two classifications: resource-based classification and tierbased classification. In resource-based classification, we group these techniques according to the resource they optimize and introduce them in detail. In tier-based classification, we propose a novel hybrid technique framework with five tiers and conduct a comprehensive comparison between these techniques. We make a conjecture that most techniques in different tiers can be used jointly. Finally, we conclude the research challenges and potential future works. INDEX TERMS data update, cloud storage, erasure coding, survey
“…Therefore, it is significant to improve the update efficiency for erasure codes. Motivated thereby, a plethora of efforts have been devoted recently to optimize the updating performance both in reducing I/Os and network transmission latency [6], [14]- [16]. Existing update schemes for erasure codes, such as Azure [17] and CodFS [14], adopt a log-based data update or a hybrid of in-place data updates and log-based parity updates to reduce I/Os by sequentially appending updates.…”
Section: A Motivationmentioning
confidence: 99%
“…Existing update schemes for erasure codes, such as Azure [17] and CodFS [14], adopt a log-based data update or a hybrid of in-place data updates and log-based parity updates to reduce I/Os by sequentially appending updates. Alternatively, the authors in [6], [15], [16] try to mitigate the network transfer overhead by optimizing the update schedule and procedure.…”
Owing to the high availability and space-efficiency of erasure codes, they have become the de facto standard to provide data durability in large scale distributed storage systems. The updateintensive workloads of erasure codes lead to a large amount of data transmission and I/O. As a result, it becomes a major challenge to reduce the amount of data transmission and optimize the use of existing network resources so that the update efficiency of the erasure codes could be improved. However, very little research has been done to optimize the update efficiency of the erasure codes under multiple QoS metrics. In this paper, our proposed update scheme, the Ant Colony Optimization based multiple data nodes Update Scheme (ACOUS) employs a two-stage rendezvous data update procedure to optimize the multiple data nodes updates. Specifically, the two-stage rendezvous data update procedure performs the data delta collection and the parity delta distribution based on a multi-objective update tree which is built by the ant colony optimization routing algorithm. Under typical data center network topologies, extensive experimental results show that, compared to the traditional TA-Update scheme, our scheme is able to achieve a 26% to 37% reduction of update delay with convergence guarantee at the cost of negligible computation overhead.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.