Abstract-Hadoop is a popular open-source implementationof MapReduce for the analysis of large datasets. To manage storage resources across the cluster, Hadoop uses a distributed user-level filesystem. This filesystem -HDFS -is written in Java and designed for portability across heterogeneous hardware and software platforms. This paper analyzes the performance of HDFS and uncovers several performance issues. First, architectural bottlenecks exist in the Hadoop implementation that result in inefficient HDFS usage due to delays in scheduling new MapReduce tasks. Second, portability limitations prevent the Java implementation from exploiting features of the native platform. Third, HDFS implicitly makes portability assumptions about how the native platform manages storage resources, even though native filesystems and I/O schedulers vary widely in design and behavior. This paper investigates the root causes of these performance bottlenecks in order to evaluate tradeoffs between portability and performance in the Hadoop distributed filesystem.
The evaluation of new network server architectures is usually performed experimentally using either a simulator or a hardware prototype. Accurate simulation of the hardwaresoftware interface within the network subsystem is challenging due to the interactions of multiple asynchronous systems. Small timing inaccuracies in such a system can perturb the hardware and software state yielding potentially misleading results. Hardware prototypes show more promise because they are real-world implementations, not simplifications. Existing Ethernet network interface cards (NICs) are unsuitable for prototyping as they lack the capability and/or flexibility for advanced networking research.RiceNIC is an open network interface prototyping platform for public use. This reconfigurable and programmable Gigabit Ethernet NIC is designed to address the dilemma of how to accurately evaluate new ideas in network server architecture, and is built for use in experimental research and education. The flexibility and capability of RiceNIC has proven invaluable in recent research efforts.
This paper introduces the Axon, an Ethernet-compatible device for creating large-scale datacenter networks. Axons are inexpensive, practical devices that are demonstrated using prototype hardware. Functionally, Axons replace Ethernet switches and maintain full compatibility with existing Ethernet hosts. Between themselves, however, Axons transparently use source-routed Ethernet. This unlocks many benefits, such as improved network scalability, performance, and flexibility.In an Axon network, all state required to route a host's packets is placed in the local Axon-the Axon to which the host is directly connected. Therefore, regardless of the scale of the network, the route computation and storage needs of a single Axon device only need to scale with the demands of its locally-connected hosts. This is in stark contrast to conventional switched Ethernet, which requires routing resources proportional to the traffic that flows through the device. Scalability is also increased by eliminating the use of packet flooding for automatic location and address discovery. Further, source-routed Ethernet increases network flexibility by supporting different route selection strategies. For example, shortest-path routing could be employed, or longer paths selected to minimize congestion by balancing traffic across redundant links.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.