Proceedings of the 2013 International Symposium on Memory Management 2013
DOI: 10.1145/2491894.2464160
|View full text |Cite
|
Sign up to set email alerts
|

Rigorous benchmarking in reasonable time

Abstract: Experimental evaluation is key to systems research. Because modern systems are complex and non-deterministic, good experimental methodology demands that researchers account for uncertainty. To obtain valid results, they are expected to run many iterations of benchmarks, invoke virtual machines (VMs) several times, or even rebuild VM or benchmark binaries more than once. All this repetition costs time to complete experiments. Currently, many evaluations give up on sufficient repetition or rigorous statistical m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
23
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 36 publications
(24 citation statements)
references
References 21 publications
1
23
0
Order By: Relevance
“…Once machine code generation has completed, the VM is said to have finished warming up, and the program is said to be executing at a steady state of peak performance. 1 While the length of the warmup period is dependent on the program and JIT compiler, all JIT compiling VMs are based on this performance model [Kalibera and Jones 2013].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Once machine code generation has completed, the VM is said to have finished warming up, and the program is said to be executing at a steady state of peak performance. 1 While the length of the warmup period is dependent on the program and JIT compiler, all JIT compiling VMs are based on this performance model [Kalibera and Jones 2013].…”
Section: Discussionmentioning
confidence: 99%
“…Georges et al [2007]. Kalibera and Jones [2013] convincingly show the limitations of such approaches, presenting instead a manual approach to determining if and when a steady state has been reached. While this is a significant improvement on previous methods, it is time-consuming, prone to human inconsistency, and gives no indication as to whether the steady state represents peak performance or not.…”
Section: Overview Of the Methodologymentioning
confidence: 99%
“…Note that the slightly higher geomean gains on the AMD system, as compared to the Intel system, are due to the underlying differences in the architecture and such improved gains have been observed across all the kernels (both high‐sync‐op and low‐sync‐op). Since the speedups in Figure are small (close to 1×), we also report the confidence intervals for the speedup ratio (as defined by Kalibera and Jones) for the 16‐core Intel system and 64‐core AMD system in Figure . The narrow width of the confidence intervals shows that the execution time is fairly stable across different runs.…”
Section: Implementation and Evaluationmentioning
confidence: 99%
“…This is because the latter includes a series of parallel‐for‐loops leading to significant task creation and termination overheads, which is avoided in the former because of the use of clocks. Since the speedups in Figure are small (close to 1×), we also report the confidence intervals for the speedup ratio (as defined by Kalibera and Jones) of the async‐finish kernel versions compared to the baseline and uClocks versions, for two of the highest configurations (the 16‐core Intel system and 64‐core AMD system in Figure ).…”
Section: Implementation and Evaluationmentioning
confidence: 99%
“…Performance measurements may lead to incorrect results if not handled carefully [1]. Thus, a statistical rigorous performance evaluation is required [16,23,28]. To mitigate instability and incorrect results, we differentiate VM start-up and steady-state.…”
Section: Corpusmentioning
confidence: 99%