The rapid growth of the Internet over the past several years has focused on speed of deployment while, in many cases, ignoring the requirements of high-performance end-to-end transactions. Given that the Internet is primarily an infrastructure to flow information from where it is stored to where it is requested, we can view the communication path between end points as a chain and each networking device along the path as a link in the chain, as shown in Figure 1.So how can we make this chain stronger? Clustering technology offers a way to increase overall reliability and performance by strengthening one link in the chain without adding others. We have implemented this technology in a distributed computing architecture for network elements. The architecture, called Raincore, originated in the Reliable Array of Independent Nodes, or RAIN, research collaboration between the California Institute of Technology and the U.S. National Aeronautics and Space Agency's Jet Propulsion Laboratory.1 The RAIN project focused on developing high-performance, fault-tolerant, portable clustering technology for spaceborne computing (see the sidebar, "RAIN and Other Related Work in Cluster Technology," p. 72). The technology that emerged from this project became the basis for a spinoff company, Rainfinity, which has the exclusive intellectual property rights to the RAIN technology.In this report, we describe the Raincore conceptual architecture and distributed services, which are designed to make it easy for developers to port their applications to run on top of a cluster of networking elements. We include two applications: a Web server prototype that was part of the original RAIN research project and a commercial firewall cluster product from Rainfinity.
Distributed Systems in a Networking EnvironmentA cluster of networking elements operating together for the same purpose effectively creates a distributed system in a networking environment. For example, the front end of the server farm in Figure 1 could use a cluster of firewalls rather than a single firewall.The objective of a distributed system is to I balance the processing load by distributing network traffic among the member nodes in a way that increases overall throughput, and I enable healthy nodes to discover failed nodes and to take over their networking traffic without interrupting the traffic flow.The key challenges of a distributed system are to maintain consensus among the machines on the exact state of the cluster and to make collective decisions without conflicts. Networking environments pose three unique requirements on distributed system solutions:I the need to scale up networking throughput as well as computing power, I the need to compensate for the negative performance effects of task switching between the different services supported by networking elements, and I the need for fast fail-over time to maintain network connections in the event of failures.Raincore is designed to meet these challenges for Internet applications at corporate firewalls and other gateways...