SUMMARYRecent improvements in network and workstation performance have made workstation clusters an attractive architecture for diverse workloads, including interactive sequential and parallel applications. Although viable hardware solutions are available today, the largest challenge in making such a cluster usable lies in the system software. This paper describes the design and implementation of GLUnix, operating system middleware for a cluster of workstations. GLUnix was designed to provide transparent remote execution, support for interactive parallel and sequential jobs, load ballancing, and backward compatibility for existing application binaries. GLUnix was constructed to be easily portable to a number of platforms. GLUnix has been in daily use for over two and a half years and is currently running on a 100-node cluster of Sun UltraSPARCs. This paper relates our experiences with designing, building, and operating GLUnix. We discuss three important design tradeoffs faced by any cluster system, and present the reasons for our choices. Each of these design decisions is then re-evaluated in light of both our experience and recent technological advancements. We then describe the user-level, centralized, event-driven architecture of GLUnix and highlight a number of aspects of the implementation. Performance and scalability measurements of the system indicate that a centralized, user-level design can scale gracefully to significant cluster sizes, incurring only an additional 220 s of overhead per node for remote execution. The discussion focuses on the successes and failures we encountered while building and maintaining the system, including a characterization of the limitations of a userlevel implementation and various features that were added to satisfy the user community.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.