David Petrou scite author profile

SUMMARYRecent improvements in network and workstation performance have made workstation clusters an attractive architecture for diverse workloads, including interactive sequential and parallel applications. Although viable hardware solutions are available today, the largest challenge in making such a cluster usable lies in the system software. This paper describes the design and implementation of GLUnix, operating system middleware for a cluster of workstations. GLUnix was designed to provide transparent remote execution, support for interactive parallel and sequential jobs, load ballancing, and backward compatibility for existing application binaries. GLUnix was constructed to be easily portable to a number of platforms. GLUnix has been in daily use for over two and a half years and is currently running on a 100-node cluster of Sun UltraSPARCs. This paper relates our experiences with designing, building, and operating GLUnix. We discuss three important design tradeoffs faced by any cluster system, and present the reasons for our choices. Each of these design decisions is then re-evaluated in light of both our experience and recent technological advancements. We then describe the user-level, centralized, event-driven architecture of GLUnix and highlight a number of aspects of the implementation. Performance and scalability measurements of the system indicate that a centralized, user-level design can scale gracefully to significant cluster sizes, incurring only an additional 220 s of overhead per node for remote execution. The discussion focuses on the successes and failures we encountered while building and maintaining the system, including a characterization of the limitations of a userlevel implementation and various features that were added to satisfy the user community.

show abstract

Cluster scheduling for explicitly-speculative tasks

Petrou

Ganger

Gibson

2004

View full text Add to dashboard Cite

Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. AbstractA process scheduler on a shared cluster, grid, or supercomputer that is informed which submitted tasks are possibly unneeded speculative tasks can use this knowledge to better support increasingly prevalent user work habits, lowering user-visible response time, lowering user costs, and increasing resource provider revenue.Large-scale computing often consists of many speculative tasks (tasks that may be canceled) to test hypotheses, search for insights, and review potentially finished products. For example, speculative tasks are issued by bioinformaticists comparing dna sequences, computer graphics artists rendering scenes, and computer researchers studying caching. This behaviorexploratory searches and parameter studies, made more common by the costeffectiveness of cluster computing -on existing schedulers without speculative task support results in a mismatch of goals and suboptimal scheduling. Users wish to reduce their time waiting for needed task output and the amount they will be charged for unneeded speculation, making it unclear to the user how many speculative tasks they should submit. This thesis introduces 'batchactive' scheduling (combining batch and interactive characteristics) to exploit the inherent speculation in common application scenarios. With a batchactive scheduler, users submit explicitlylabeled batches of speculative tasks exploring ambitious lines of inquiry, and users interactively request task outputs when these outputs are found to be needed. After receiving and considering an output for some time, a user decides whether to request more outputs, cancel tasks, or disclose new speculative tasks. Users are encouraged to disclose more computation because batchactive scheduling intelligently prioritizes among speculative and non-speculative tasks, providing good wait-time-based metrics, and because batchactive scheduling employs an incentive pricing mechanism which charges for only requested task outputs (i.e., unneeded speculative tasks are not charged), providing better cost-based metrics for users. These aspects can lead to higher billed server utilization, encouraging batchactive adoption by resource providers organized as either cost-or profit-centers. vi · Cluster sche...

show abstract

Dynamic Function Placement in Active Storage Clusters

Amiri¹,

Petrou²,

Ganger³

et al. 1999

View full text Add to dashboard Cite

Optimally partitioning application and filesystem functionality within a cluster of clients and servers is a difficult problem due to dynamic variations in application behavior, resource availability and workload mixes. This paper presents ABACUS, a run-time system that monitors and dynamically changes function placement for applications that manipulate large data sets. Several examples of data-intensive workloads are used to show the importance of proper function placement and its dependence on dynamic runtime characteristics, with performance differences frequently reaching 2-10X. We evaluate how well the ABACUS prototype adapts to run-time system behavior, including both long-term variation (e.g., filter selectivity) and short-term variation (e.g., multi-phase applications and inter-application resource contention). Our experiments with ABACUS indicate that it is possible to adapt in all of these situations and that the adaptation converges most quickly in those cases where the performance impact is most significant.

show abstract

Easing the management of data-parallel systems via adaptation

Petrou

Amiri

Ganger

et al. 2000

View full text Add to dashboard Cite

> Fast Ether Cable modem WaveLAN Storage Storage CPU Server CPU Server Switch Gigabit Ether Figure 1: Users running data-parallel applications across the Internet. The data stores are on server clusters, which, compared to monolithic machines, flexibly support different kinds of concurrent workloads, are easier to upgrade, and have the potential to support independent node faults. Our client machines, in contrast to the traditional view, are active collaborators with the clusters in providing the end-result to the user. Server cluster lients

show abstract

Scheduling speculative tasks in a compute farm

Petrou

Gibson

Ganger

View full text Add to dashboard Cite

Users often behave speculatively, submitting work that initially they do not know is needed. Farm computing often consists of single node speculative tasks issued by, e.g., bioinformaticists comparing dna sequences and computer graphics artists rendering scenes who wish to reduce their time waiting for needed tasks and the amount they will be charged for unneeded speculation. Existing schedulers are not effective for such behavior. Our 'batchactive' scheduling exploits speculation: users submit explicitlylabeled batches of speculative tasks, interactively request outputs when ready to process them, and cancel tasks found not to be needed. Users are encouraged to participate by a new pricing mechanism charging for only requested tasks no matter what ran.Over a range of simulated user and task characteristics, we show that: batchactive scheduling improves visible response time -a new metric for speculative domains -by at least 2X for 20% of the simulations; batchactive scheduling supports higher billable load at lower visible response time, encouraging adoption by resource providers; and a batchactive policy favoring users who use more of their speculative tasks provides additional performance and resists a denialof-service.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

David Petrou

GLUix: a global layer unix for a network of workstations

Cluster scheduling for explicitly-speculative tasks

Dynamic Function Placement in Active Storage Clusters

Easing the management of data-parallel systems via adaptation

Scheduling speculative tasks in a compute farm

Contact Info

Product

Resources

About