Flow control for Latency-Critical RPCs

Kogias, Marios; Bugnion, Edouard

doi:10.1145/3229538.3229541

Cited by 5 publications

(3 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The load generated by devices tends to follow certain rules (e.g., Poisson inter-arrival time distribution [37]). At the beginning of each time slice, the end device possesses multiple tasks k m (t) and the data size of k m (t) obeys the uniform distribution.…”

Section: Mobile Device Nodementioning

confidence: 99%

A Graph Attention Mechanism-Based Multiagent Reinforcement-Learning Method for Task Scheduling in Edge Computing

Pang

2022

Electronics

View full text Add to dashboard Cite

Multi-access edge computing (MEC) enables end devices with limited computing power to provide effective solutions while dealing with tasks that are computationally challenging. When each end device in an MEC scenario generates multiple tasks, how to reasonably and effectively schedule these tasks is a large-scale discrete action space problem. In addition, how to exploit the objectively existing spatial structure relationships in the given scenario is also an important factor to be considered in task-scheduling algorithms. In this work, we consider indivisible, time-sensitive tasks under this scenario and formalize the task-scheduling problem to minimize the long-term losses. We propose a multiagent collaborative deep reinforcement learning (DRL)-based distributed scheduling algorithm based on graph attention neural networks (GATs) to solve task-scheduling problems in the MEC scenario. Each end device creates a graph representation agent to extract potential spatial features in the scenario and a scheduling agent to extract the timing-related features of the tasks and make scheduling decisions using a gated recurrent unit (GRU). The simulation results show that, compared with several baseline algorithms, our proposed algorithm can take advantage of the spatial positional relationship of devices in the environment, significantly reduce the average delay and drop rate, and improve link utilization.

show abstract

Section: Mobile Device Nodementioning

confidence: 99%

A Graph Attention Mechanism-Based Multiagent Reinforcement-Learning Method for Task Scheduling in Edge Computing

Pang

2022

Electronics

View full text Add to dashboard Cite

show abstract

“…Provisioning a receive buffer of limited size on the server requires the transport protocol to signal a "failure to deliver" (NACK) if the request is dropped because of a full queue. It is up to the client to react to a NACK reception; for example, the request could be retried or sent to a different server, as proposed by Kogias et al [45]. Exposing delivery failures to the client follows the end-to-end principle in systems design [46]: the client application is best equipped to handle such violations and should be informed immediately.…”

Section: Bounded Server-side Queuingmentioning

confidence: 99%

“…The need for low latency has led systems designers to aggressively limit queuing in the transport and RPC protocol stacks themselves [45], [59]. Kogias et al [45] also observe that limiting server-side queuing is critical for µs-scale RPCs, and use TCP flow control to limit the number of requests per connection based on the application's SLO. NEBULA performs buffer management for hardware-rather than software-terminated protocols.…”

Section: Related Workmentioning

confidence: 99%

The NEBULA RPC-Optimized Architecture

Sutherland¹,

Gupta²,

Falsafi³

et al. 2020

2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)

View full text Add to dashboard Cite

Large-scale online services are commonly structured as a network of software tiers, which communicate over the datacenter network using RPCs. Ongoing trends towards software decomposition have led to the prevalence of tiers receiving and generating RPCs with runtimes of only a few microseconds. With such small software runtimes, even the smallest latency overheads in RPC handling have a significant relative performance impact. In particular, we find that growing network bandwidth introduces queuing effects within a server's memory hierarchy, considerably hurting the response latency of fine-grained RPCs. In this work we introduce NEBULA, an architecture optimized to accelerate the most challenging microsecond-scale RPCs, by leveraging two novel mechanisms to drastically improve server throughput under strict tail latency goals. First, NEBULA reduces detrimental queuing at the memory controllers via hardware support for efficient in-LLC network buffer management. Second, NEBULA's network interface steers incoming RPCs into the CPU cores' L1 caches, improving RPC startup latency. Our evaluation shows that NEBULA boosts the throughput of a state-of-the-art keyvalue store by 1.25-2.19x compared to existing proposals, while maintaining strict tail latency goals.

show abstract