Proceedings of the Fourteenth EuroSys Conference 2019 2019
DOI: 10.1145/3302424.3303973
|View full text |Cite
|
Sign up to set email alerts
|

Managing Tail Latency in Datacenter-Scale File Systems Under Production Constraints

Abstract: Distributed file systems often exhibit high tail latencies, especially in large-scale datacenters and in the presence of competing (and possibly higher priority) workloads. This paper introduces techniques for managing tail latencies in these systems, while addressing the practical challenges inherent in production datacenters (e.g., hardware heterogeneity, interference from other workloads, the need to maximize simplicity and maintainability). We implement our techniques in a scalable distributed file system … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 32 publications
(7 citation statements)
references
References 32 publications
0
7
0
Order By: Relevance
“…Parameter z is a function of the workload and it will be explained shortly. The global function f : R N → R is the sum of the cost function (7) of each node v i . The main goal of the nodes is to allocate the jobs in order to minimize the cost function in a distributed fashion, by communicating with their neighbors only.…”
Section: Optimization Problemmentioning
confidence: 99%
See 1 more Smart Citation
“…Parameter z is a function of the workload and it will be explained shortly. The global function f : R N → R is the sum of the cost function (7) of each node v i . The main goal of the nodes is to allocate the jobs in order to minimize the cost function in a distributed fashion, by communicating with their neighbors only.…”
Section: Optimization Problemmentioning
confidence: 99%
“…Solving a scheduling optimization problem in such a large-scale system is challenging due to the size of the network and the dynamic nature of resource requirements of incoming and existing workloads. Furthermore, due to unexpected cluster changes as nodes randomly fail and/or abnormal runtime behaviors due to software or configuration faults and resource contention, latency variability is introduced into the network [1], [7]. To this end, we posit a novel scheme that takes in account these potential latency variations in the form of explicit delays in the communication links during planning, while still remaining a asynchronous in its operation and we guarantee that it will converge in finite-time.…”
Section: Introductionmentioning
confidence: 99%
“…The main reason for tail latency in RAID-enabled SSDs is that diferent RAID components (e.g. SSD channels) have uneven busyness on I/Os and GCs while running user applications [18]. Especially with respect to the issue of mitigating the negative efects of garbage collection that is the heaviest operation in SSDs, I/O requests on the GC target channels will be fulilled by reading the data on other channels of the same stripe with certain XOR computations [15,19,20].…”
Section: Introductionmentioning
confidence: 99%
“…The development of new flash memories such as 3D-stacked charge-trap (CT)-based ones largely benefits the storage density of modern SSDs. Meanwhile, they show some new physical characteristics, e.g., the increased block size and layer speed variation, the effect of which on performance have not been fully investigated [ 9 ].…”
Section: Introductionmentioning
confidence: 99%