In multicluster systems, and more generally, in grids, jobs may require co-allocation, i.e., the simultaneous allocation of resources such as processors and input files in multiple clusters. While such jobs may have reduced runtimes because they have access to more resources, waiting for processors in multiple clusters and for the input files to become available in the right locations, may introduce inefficiencies. Moreover; as single jobs now have to rely on multiple resource managers, co-allocation introduces reliability problems. In this paper; we present two additions to the original design of our KOALA co-allocating scheduler (different priority levels ofjobs and incrementally claiming processors), and we report on our experiences with KOALA in our multicluster testbed while it was unstable.
SUMMARYIn multicluster systems, and more generally in grids, jobs may require co-allocation, that is, the simultaneous or coordinated access of single applications to resources of possibly multiple types in multiple locations managed by different resource managers. Co-allocation presents new challenges to resource management in grids, such as locating sufficient resources in geographically distributed sites, allocating and managing resources in multiple, possibly heterogeneous sites for single applications, and coordinating the execution of single jobs at multiple sites. Moreover, as single jobs now may have to rely on multiple resource managers, co-allocation introduces reliability problems. In this paper, we present the design and implementation of a co-allocating grid scheduler named KOALA that meets these co-allocation challenges. In addition, we report on the results of an analysis of the performance in our multicluster testbed of the co-allocation policies built into KOALA. We also include the results of a performance and reliability test of KOALA while our testbed was unstable.
Abstract-In large-scale distributed execution environments such as multicluster systems and grids, resource availability may vary due to resource failures and because resources may be added to or withdrawn from such environments at any time. In addition, single sites in such systems may have to deal with workloads originating from both local users and from many other sources. As a result, application malleability, that is, the property of applications to deal with a varying amount of resources during their execution, may be very beneficial for performance. In this paper we present the design of the support of and scheduling policies for malleability in our KOALA multicluster scheduler with the help of our DYNACO framework for application malleability. In addition, we show the results of experiments with scheduling malleable workloads with KOALA in our DAS multicluster testbed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.