Proceedings of the 15th International Conference on Emerging Networking Experiments and Technologies 2019
DOI: 10.1145/3359989.3365412
|View full text |Cite
|
Sign up to set email alerts
|

Rss++

Abstract: While the current literature typically focuses on load-balancing among multiple servers, in this paper, we demonstrate the importance of load-balancing within a single machine (potentially with hundreds of CPU cores). In this context, we propose a new load-balancing technique (RSS++) that dynamically modifies the receive side scaling (RSS) indirection table to spread the load across the CPU cores in a more optimal way. RSS++ incurs up to 14x lower 95 th percentile tail latency and orders of magnitude fewer pac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 30 publications
(4 citation statements)
references
References 34 publications
0
4
0
Order By: Relevance
“…In addition to the works discussed throughout the paper, the work on NFV performance acceleration can be classified into three categories: 1 relies on hardware accelerators to improve processing speed by offloading (part of) packet processing into an FPGA, GPU, or modern NIC [20,28,45,52,53,69,87,96,98,101,104,105]; 2 focuses on NFV execution models and tries to improve the performance of either the pipeline/parallelism model [43,55,61,86,103] or run-to-completion (RTC) model [37,76]; and 3 improves the performance of NFV by reducing/eliminating redundant operations and/or merging similar packet processing elements into (one) consolidated optimized equivalent [1,12,40,44,55,85]. The second category also includes efforts toward better scheduling & load balancing [4,5,7,41,50,94] or more efficient I/O [24,25]. Our work is orthogonal and complementary to these.…”
Section: Related Workmentioning
confidence: 99%
“…In addition to the works discussed throughout the paper, the work on NFV performance acceleration can be classified into three categories: 1 relies on hardware accelerators to improve processing speed by offloading (part of) packet processing into an FPGA, GPU, or modern NIC [20,28,45,52,53,69,87,96,98,101,104,105]; 2 focuses on NFV execution models and tries to improve the performance of either the pipeline/parallelism model [43,55,61,86,103] or run-to-completion (RTC) model [37,76]; and 3 improves the performance of NFV by reducing/eliminating redundant operations and/or merging similar packet processing elements into (one) consolidated optimized equivalent [1,12,40,44,55,85]. The second category also includes efforts toward better scheduling & load balancing [4,5,7,41,50,94] or more efficient I/O [24,25]. Our work is orthogonal and complementary to these.…”
Section: Related Workmentioning
confidence: 99%
“…This might be a problem when the derived traffic classes aggregate large subnets, which might potentially result in thousands (or even millions) of (concurrent) flows ending up at the same CPU core. In recent work (specifically RSS++ [6]), we look at ways to automatically derive sub-traffic classes of a given traffic class (i.e., by tweaking a NIC's RSS indireciton table) to perform load balancing even in the presence of a few (large) traffic classes.…”
Section: Metron's Dynamic Scaling At 100 Gbpsmentioning
confidence: 99%
“…Recently, RSS++ [6] exploited available commodity NICs' functionality to achieve stateful intraserver load balancing with minimal overhead. Metron dispatches traffic through explicit (NIC and/or OpenFlow) rules, while RSS++ steers flows to cores by modifying a NIC's RSS indirection table.…”
Section: Hardware Offloadingmentioning
confidence: 99%
“…Scheduling Policies: Based on the fact that First-Come-First-Serve (FCFS) scheduling has been shown [67] to be tail-optimal for light-tailed homogeneous tasks, many older systems did hash-based load balancing on the network interface card (NIC) using receive side scaling (RSS) and running requests to completion [24], [51], [56]. To handle imbalance between workers, newer systems enhance RSS to take into account end-host load (RSS++ [22], eRSS [60]), employ work-stealing (ZygOS [57], Shenango [55], Caladan [36], BWS [32], Elfen [68], Li at al. [50]), or use techniques, such as join-idle-Queue [52] or join-bounded-shortest-queue [49].…”
Section: User-level Threadingmentioning
confidence: 99%