An update on the scalability limits of the Condor batch system

Bradley, D.; Clair, T St; Farrellee, Matthew; Guo, Zhihan; Livny, Miron; Sfiligoi, I.; Tannenbaum, Todd

doi:10.1088/1742-6596/331/6/062002

Cited by 9 publications

(4 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the future, this allows to add not only local institutes but also organizations and their hardware across the globe. This includes, but is not limited to, scalability in terms of the number of addressable hardware and scalability in terms of the spatial extent of the network [41][42][43][44]. Table 2 lists a number of currently available grid middleware components.…”

Section: Simulation Hardwarementioning

confidence: 99%

AixViPMaP®—an Operational Platform for Microstructure Modeling Workflows

Koschmieder

Hojda

Apel

et al. 2019

Integr Mater Manuf Innov

View full text Add to dashboard Cite

The present article describes design, architecture, and implementation of the Aachen ("Aix") Virtual Platform for Materials Processing-AixViPMaP ®. This simulation platform focuses on enabling automatic simulation workflows in the area of microstructure evolution and microstructure property relationships by continuum models. Following a description of a variety AixViPMaP® functionalities like user management, the currently implemented software tools, simulation workflows, data storage, grid infrastructure, and many more, some example workflows which have been run on AixViPMaP® are presented in detail. These workflow examples-although each being specific-can readily be transferred to other materials or to similar processes as the major simulation tools used in these workflows are all generic and thus applicable to a wide range of metals and technical alloys. The article concludes with a discussion on the performance and benefits of the platform, an outlook on its future development and on its open, future availability for both academic and commercial use.

show abstract

Section: Simulation Hardwarementioning

confidence: 99%

AixViPMaP®—an Operational Platform for Microstructure Modeling Workflows

Koschmieder

Hojda

Apel

et al. 2019

Integr Mater Manuf Innov

View full text Add to dashboard Cite

show abstract

“…There is however a major problem with this view: the central queue of tasks may become a bottleneck for the scheduling and execution of large-scale workloads [2], [8]. As the number of pilots increases, so does the rate of requests to be served.…”

Section: Problem Statement and Related Workmentioning

confidence: 99%

“…Existing pilot systems have already met this problem and have found different ways to tackle it. Tasks may be grouped and evaluated in bulk or pilots may be matched based on their site only [8]. In this case, we are effectively restricting or even giving up micro-scheduling.…”

Section: Problem Statement and Related Workmentioning

confidence: 99%

Distributed scheduling and data sharing in late-binding overlays

Peris

Hernández

Huedo

2014

2014 International Conference on High Performance Computing &Amp; Simulation (HPCS)

View full text Add to dashboard Cite

Abstract-Pull-based late-binding overlays are used in some of today's largest computational grids. Job agents are submitted to resources with the duty of retrieving real workload from a central queue at runtime. This helps overcome the problems of these very complex environments, namely, heterogeneity, imprecise status information and relatively high failure rates. In addition, the late job assignment allows dynamic adaptation to changes in the grid conditions or user priorities. However, as the scale grows, the central assignment queue may become a bottleneck for the whole system. This article presents a distributed scheduling architecture for late-binding overlays, which addresses these scalability issues. Our system lets execution nodes build a distributed hash table and delegates job matching and assignment to them. This reduces the load on the central server and makes the system much more scalable and robust. Moreover, scalability makes fine-grained scheduling possible, and enables new functionalities like the implementation of a distributed data cache on the execution nodes, which helps alleviate the commonly congested grid storage services.

show abstract

“…High performance computing schedulers fall into this category: they optimize for large jobs with complex constraints, and target maximum throughput in the tens to hundreds of scheduling decisions per second (e.g., SLURM [10]). Similarly, Condor supports complex features including a rich constraint language, job checkpointing, and gang scheduling using a heavy-weight matchmaking process that results in maximum scheduling throughput of 10 to 100 jobs per second [4].…”

Section: Related Workmentioning

confidence: 99%

Sparrow

Ousterhout

Wendell

Zaharia

et al. 2013

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

433

View full text Add to dashboard Cite

Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedule millions of tasks per second on appropriate machines while offering millisecond-level latency and high availability. We demonstrate that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design. We implement and deploy our scheduler, Sparrow, on a 110-machine cluster and demonstrate that Sparrow performs within 12% of an ideal scheduler.

show abstract

An update on the scalability limits of the Condor batch system

Cited by 9 publications

References 2 publications

AixViPMaP®—an Operational Platform for Microstructure Modeling Workflows

AixViPMaP®—an Operational Platform for Microstructure Modeling Workflows

Distributed scheduling and data sharing in late-binding overlays

Sparrow

Contact Info

Product

Resources

About