2011
DOI: 10.1088/1742-6596/331/6/062002
|View full text |Cite
|
Sign up to set email alerts
|

An update on the scalability limits of the Condor batch system

Abstract: Condor is being used extensively in the HEP environment. It is the batch system of choice for many compute farms, including several WLCG Tier 1s, Tier 2s and Tier 3s. It is also the building block of one of the Grid pilot infrastructures, namely glideinWMS. As with any software, Condor does not scale indefinitely with the number of users and/or the number of resources being handled. In this paper we are presenting the current observed scalability limits of both the latest production and the latest development … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 2 publications
0
4
0
Order By: Relevance
“…In the future, this allows to add not only local institutes but also organizations and their hardware across the globe. This includes, but is not limited to, scalability in terms of the number of addressable hardware and scalability in terms of the spatial extent of the network [41][42][43][44]. Table 2 lists a number of currently available grid middleware components.…”
Section: Simulation Hardwarementioning
confidence: 99%
“…In the future, this allows to add not only local institutes but also organizations and their hardware across the globe. This includes, but is not limited to, scalability in terms of the number of addressable hardware and scalability in terms of the spatial extent of the network [41][42][43][44]. Table 2 lists a number of currently available grid middleware components.…”
Section: Simulation Hardwarementioning
confidence: 99%
“…There is however a major problem with this view: the central queue of tasks may become a bottleneck for the scheduling and execution of large-scale workloads [2], [8]. As the number of pilots increases, so does the rate of requests to be served.…”
Section: Problem Statement and Related Workmentioning
confidence: 99%
“…Existing pilot systems have already met this problem and have found different ways to tackle it. Tasks may be grouped and evaluated in bulk or pilots may be matched based on their site only [8]. In this case, we are effectively restricting or even giving up micro-scheduling.…”
Section: Problem Statement and Related Workmentioning
confidence: 99%
“…High performance computing schedulers fall into this category: they optimize for large jobs with complex constraints, and target maximum throughput in the tens to hundreds of scheduling decisions per second (e.g., SLURM [10]). Similarly, Condor supports complex features including a rich constraint language, job checkpointing, and gang scheduling using a heavy-weight matchmaking process that results in maximum scheduling throughput of 10 to 100 jobs per second [4].…”
Section: Related Workmentioning
confidence: 99%