2014
DOI: 10.1016/j.jpdc.2014.06.013
|View full text |Cite
|
Sign up to set email alerts
|

Experience with using the Parallel Workloads Archive

Abstract: Science is based upon observation. The scientific study of complex computer systems should therefore be based on observation of how they are used in practice, as opposed to how they are assumed to be used or how they were designed to be used. In particular, detailed workload logs from real computer systems are invaluable for research on performance evaluation and for designing new systems.Regrettably, workload data may suffer from quality issues that might distort the study results, just as scientific observat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
141
0
1

Year Published

2014
2014
2016
2016

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 235 publications
(142 citation statements)
references
References 40 publications
0
141
0
1
Order By: Relevance
“…For example, consider the instantaneous utilization data at several large parallel machines shown in Figure 2.19, namely the fraction of processors that are actively serving some job at each instant; it is calculated by tracking the start and end time of each job, and summing the processors used by jobs that have started but have not yet terminated. Naturally, this should be bounded by the number of processors in the system, but the data seems to indicate that in many instances more than 100% of the processors were allocated [250]. However, such exceptions are not easily attributable to any specific subset of jobs, so it is not clear which jobs should be removed to fix the problem.…”
Section: Noise and Errorsmentioning
confidence: 99%
See 2 more Smart Citations
“…For example, consider the instantaneous utilization data at several large parallel machines shown in Figure 2.19, namely the fraction of processors that are actively serving some job at each instant; it is calculated by tracking the start and end time of each job, and summing the processors used by jobs that have started but have not yet terminated. Naturally, this should be bounded by the number of processors in the system, but the data seems to indicate that in many instances more than 100% of the processors were allocated [250]. However, such exceptions are not easily attributable to any specific subset of jobs, so it is not clear which jobs should be removed to fix the problem.…”
Section: Noise and Errorsmentioning
confidence: 99%
“…and experience with using it. For example, it is important to share information about problems in the data, and any data cleaning that was performed [250].…”
Section: Sharing Datamentioning
confidence: 99%
See 1 more Smart Citation
“…The workloads we consider in this work match the cluster-based traces of the Parallel Workloads Archive [20]. We further assume that jobs are CPU-bound and their runtime depends linearly on the speed of the (virtual) processor where they are executed.…”
Section: Workload and Resource Modelmentioning
confidence: 99%
“…The synthetic workloads are short-term but with significantly different job arrival patterns, allowing us to better characterize the impact of the arrival process on portfolio scheduling. The real workload is a whole trace from the Parallel Workloads Archive [20] and allows us to gain valuable insight into the operation of our portfolio scheduler in realistic conditions. Synthetic Workloads: In this paper, we generate five types of workloads that have different user behaviors but the same (real) job run times.…”
Section: Workloadsmentioning
confidence: 99%