Erasure coding has been recognized as a powerful method to mitigate delays due to slow or straggling nodes in distributed systems. This work shows that erasure coding of data objects can flexibly handle skews in the request rates. Coding can help boost the service rate region, that is, increase the overall volume of data access requests that the system can handle. This paper aims to postulate the service rate region as an important consideration in the design of erasure-coded distributed systems.We highlight several open problems that can be grouped into two broad threads: 1) characterizing the service rate region of a given code and finding the optimal request allocation, and 2) designing the underlying erasure code for a given service rate region. As contributions along the first thread, we find the rate regions of maximum-distance-separable, locally repairable, and simplex codes. We show the effectiveness of hybrid codes that combine replication and erasure coding in terms of code design. We also discover fundamental connections between multi-set batch codes and the problem of maximizing the service rate region.
Contention at the storage nodes is the main cause of long and variable data access times in distributed storage systems. Offered load on the system must be balanced across the storage nodes in order to minimize contention, and load balance in the system should be robust against the skews and fluctuations in content popularities. Data objects are replicated across multiple nodes in practice to allow for load balancing. However redundancy increases the storage requirement and should be used efficiently. We evaluate load balancing performance of natural storage schemes in which each data object is stored at d different nodes and each node stores the same number of objects. We find that load balance in a system of n nodes improves multiplicatively with d as long as d = o (log(n)), and improves exponentially as soon as d = Θ (log(n)). We show that the load balance in the system improves the same way with d when the service choices are created with XOR's of r objects rather than object replicas, which also reduces the storage overhead multiplicatively by r. However, unlike accessing an object replica, access through a recovery set composed by an XOR'ed object copy requires downloading content from r nodes, which increases the load imbalance in the system additively by r.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.