Data centre applications for batch processing (e.g. map/reduce frameworks) and online services (e.g. search engines) scale by distributing data and computation across many servers. They typically follow a partition/aggregation pattern: tasks are first partitioned across servers that process data locally, and then those partial results are aggregated. This data aggregation step, however, shifts the performance bottleneck to the network, which typically struggles to support many-to-few, high-bandwidth traffic between servers.Instead of performing data aggregation at edge servers, we show that it can be done more efficiently along network paths. We describe NETAGG, a software platform that supports on-path aggregation for network-bound partition/aggregation applications. NET-AGG exploits a middlebox-like design, in which dedicated servers (agg boxes) are connected by high-bandwidth links to network switches. Agg boxes execute aggregation functions provided by applications, which alleviates network hotspots because only a fraction of the incoming traffic is forwarded at each hop. NETAGG requires only minimal application changes: it uses shim layers on edge servers to redirect application traffic transparently to the agg boxes. Our experimental results show that NETAGG improves substantially the throughput of two sample applications, the Solr distributed search engine and the Hadoop batch processing framework. Its design allows for incremental deployment in existing data centres and incurs only a modest investment cost.
Middleboxes are heavily used in the Internet to process the network traffic for a specific purpose. As there is no open standards, these proprietary boxes are expensive and difficult to upgrade. In this paper, we present a programmable platform for middleboxes called FlowOS to run on commodity hardware. It provides an elegant programming model for writing flow processing software, which hides the complexities of low-level packet processing, process synchronisation, and inter-process communication. We show that FlowOS itself does not add any significant overhead to flows by presenting some preliminary test results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.