L7-filter is a significant component in Linux's QoS framework that classifies network traffic based on application layer data. It enables subsequent distribution of network resources in respect to the priority of applications. Considerable research has been reported to deploy multicore architectures for computationally intensive applications. Unfortunately, the proliferation of multi-core architectures has not helped fast packet processing due to: 1) the lack of efficient parallelism in legacy network programs, and 2) the non-trivial configuration for scalable utilization on multi-core servers.In this paper, we propose a highly scalable parallelized L7-filter system architecture with affinity-based scheduling on a multi-core server. We start with an analytical study of the system architecture based on an offline design. Similar to Receive Side Scaling (RSS) in the NIC, we develop a model to explore the connection level parallelism in L7-filter and propose an affinity-based scheduler to optimize system scalability. Performance results show that our optimized L7-filter has superior scalability over the naive multithreaded version. It improves system performance by about 50% when all the cores are deployed.
Virtualization technology is now widely deployed on high performance networks such as 10-Gigabit Ethernet (10GE). It offers useful features like functional isolation, manageability and live migration. Unfortunately, the overhead of network I/O virtualization significantly degrades the performance of network-intensive applications. Two major factors of loss in I/O performance result from the extra driver domain to process I/O requests and the extra scheduler inside the virtual machine monitor (VMM) for scheduling domains.In this paper we first examine the negative effect of virtualization in multi-core platforms with 10GE networking. We study virtualization overhead and develop two optimizations for the VMM scheduler to improve I/O performance. The first solution uses cache-aware scheduling to reduce inter-domain communication cost. The second solution steals scheduler credits to favor I/O VCPUs in the driver domain. We also propose two optimizations to improve packet processing in the driver domain. First we re-design a simple bridge for more efficient switching of packets. Second we develop a patch to make transmit (TX) queue length in the driver domain configurable and adaptable to 10GE networks. Using all the above techniques, our experiments show that virtualized I/O bandwidth can be increased by 96%. Our optimizations also improve the efficiency by saving 36% in core utilization per gigabit. All the optimizations are based on pure software approaches and do not hinder live migration. We believe that the findings from our study will be useful to guide future VMM development.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.