Rss++

Barbette, Tom; Katsikas, Georgios P.; Maguire, Gerald Q.; Kostić, Dejan

doi:10.1145/3359989.3365412

Cited by 30 publications

(4 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition to the works discussed throughout the paper, the work on NFV performance acceleration can be classified into three categories: 1 relies on hardware accelerators to improve processing speed by offloading (part of) packet processing into an FPGA, GPU, or modern NIC [20,28,45,52,53,69,87,96,98,101,104,105]; 2 focuses on NFV execution models and tries to improve the performance of either the pipeline/parallelism model [43,55,61,86,103] or run-to-completion (RTC) model [37,76]; and 3 improves the performance of NFV by reducing/eliminating redundant operations and/or merging similar packet processing elements into (one) consolidated optimized equivalent [1,12,40,44,55,85]. The second category also includes efforts toward better scheduling & load balancing [4,5,7,41,50,94] or more efficient I/O [24,25]. Our work is orthogonal and complementary to these.…”

Section: Related Workmentioning

confidence: 99%

PacketMill: toward per-Core 100-Gbps networking

Farshin

Barbette

Roozbeh

et al. 2021

Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Self Cite

View full text Add to dashboard Cite

We present PacketMill, a system for optimizing software packet processing, which (i) introduces a new model to efficiently manage packet metadata and (ii) employs code-optimization techniques to better utilize commodity hardware. PacketMill grinds the whole packet processing stack, from the high-level network function configuration file to the low-level userspace network (specifically DPDK) drivers, to mitigate inefficiencies and produce a customized binary for a given network function. Our evaluation results show that PacketMill increases throughput (up to 36.4 Gbps ś 70%) & reduces latency (up to 101 µs ś 28%) and enables nontrivial packet processing (e.g., router) at ≈100 Gbps, when new packets arrive > 10× faster than main memory access times, while using only one processing core.

show abstract

Section: Related Workmentioning

confidence: 99%

PacketMill: toward per-Core 100-Gbps networking

Farshin

Barbette

Roozbeh

et al. 2021

Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Self Cite

View full text Add to dashboard Cite

show abstract

“…This might be a problem when the derived traffic classes aggregate large subnets, which might potentially result in thousands (or even millions) of (concurrent) flows ending up at the same CPU core. In recent work (specifically RSS++ [6]), we look at ways to automatically derive sub-traffic classes of a given traffic class (i.e., by tweaking a NIC's RSS indireciton table) to perform load balancing even in the presence of a few (large) traffic classes.…”

Section: Metron's Dynamic Scaling At 100 Gbpsmentioning

confidence: 99%

“…Recently, RSS++ [6] exploited available commodity NICs' functionality to achieve stateful intraserver load balancing with minimal overhead. Metron dispatches traffic through explicit (NIC and/or OpenFlow) rules, while RSS++ steers flows to cores by modifying a NIC's RSS indirection table.…”

Section: Hardware Offloadingmentioning

confidence: 99%

Metron

Katsikas

Barbette

Kostić

et al. 2020

ACM Trans. Comput. Syst.

Self Cite

View full text Add to dashboard Cite

Deployment of 100Gigabit Ethernet (GbE) links challenges the packet processing limits of commodity hardware used for Network Functions Virtualization (NFV). Moreover, realizing chained network functions (i.e., service chains) necessitates the use of multiple CPU cores, or even multiple servers, to process packets from such high speed links. Our system Metron jointly exploits the underlying network and commodity servers’ resources: ( i ) to offload part of the packet processing logic to the network, ( ii ) by using smart tagging to setup and exploit the affinity of traffic classes, and ( iii ) by using tag-based hardware dispatching to carry out the remaining packet processing at the speed of the servers’ cores, with zero inter-core communication. Moreover, Metron transparently integrates, manages, and load balances proprietary “blackboxes” together with Metron service chains. Metron realizes stateful network functions at the speed of 100GbE network cards on a single server, while elastically and rapidly adapting to changing workload volumes. Our experiments demonstrate that Metron service chains can coexist with heterogeneous blackboxes, while still leveraging Metron’s accurate dispatching and load balancing. In summary, Metron has ( i ) 2.75–8× better efficiency, up to ( ii ) 4.7× lower latency, and ( iii ) 7.8× higher throughput than OpenBox, a state-of-the-art NFV system.

show abstract

“…Scheduling Policies: Based on the fact that First-Come-First-Serve (FCFS) scheduling has been shown [67] to be tail-optimal for light-tailed homogeneous tasks, many older systems did hash-based load balancing on the network interface card (NIC) using receive side scaling (RSS) and running requests to completion [24], [51], [56]. To handle imbalance between workers, newer systems enhance RSS to take into account end-host load (RSS++ [22], eRSS [60]), employ work-stealing (ZygOS [57], Shenango [55], Caladan [36], BWS [32], Elfen [68], Li at al. [50]), or use techniques, such as join-idle-Queue [52] or join-bounded-shortest-queue [49].…”

Section: User-level Threadingmentioning

confidence: 99%

Adaptive Task-Space Synchronization Control of SMMS Teleoperaiton Systems with Round-Robin Scheduling Protocol

Yin

et al. 2019

2019 Chinese Control Conference (CCC)

View full text Add to dashboard Cite

Modern cloud applications are prone to high tail latencies since their requests typically follow highly-dispersive distributions. Prior work has proposed both OS-and systemlevel solutions to reduce tail latencies for microsecond-scale workloads through better scheduling. Unfortunately, existing approaches like customized dataplane OSes, require significant OS changes, experience scalability limitations, or do not reach the full performance capabilities hardware offers. We propose LibPreemptible, a preemptive user-level threading library that is flexible, lightweight, and scalable. LibPreemptible is based on three key techniques: 1) a fast and lightweight hardware mechanism for delivery of timed interrupts, 2) a general-purpose user-level scheduling interface, and 3) an API for users to express adaptive scheduling policies tailored to the needs of their applications. Compared to the prior state-of-the-art scheduling system Shinjuku, our system achieves significant tail latency and throughput improvements for various workloads without the need to modify the kernel. We also demonstrate the flexibility of LibPreemptible across scheduling policies for real applications experiencing varying load levels and characteristics.

show abstract

Rss++

Cited by 30 publications

References 34 publications

PacketMill: toward per-Core 100-Gbps networking

PacketMill: toward per-Core 100-Gbps networking

Metron

Adaptive Task-Space Synchronization Control of SMMS Teleoperaiton Systems with Round-Robin Scheduling Protocol

Contact Info

Product

Resources

About