2015
DOI: 10.1002/spe.2382
|View full text |Cite
|
Sign up to set email alerts
|

DataMill: a distributed heterogeneous infrastructure forrobust experimentation

Abstract: SUMMARYEmpirical systems research is facing a dilemma. Minor aspects of an experimental setup can have a significant impact on its associated performance measurements and potentially invalidate conclusions drawn from them. Examples of such influences, often called hidden factors, include binary link order, process environment size, compiler generated randomized symbol names, or group scheduler assignments. The growth in complexity and size of modern systems will further aggravate this dilemma, especially with … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
5
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 26 publications
0
5
0
Order By: Relevance
“…-a set of input files, -the name of the tool to use, -command-line arguments for the tool (e.g., to specify a tool configuration), -the limits for CPU time, memory, and number of CPU cores, and -the number of runs that should be executed in parallel. This benchmark definition is given in XML format; an example is available in the tool documentation 25 and in Listing 2 of Appendix B. Additionally, a tool-info module (a tool-specific Python module) needs to be written that con-tains functions for creating a command-line string for a run (including input file and user-defined command-line arguments) and for determining the result from the exit code and the output of the tool. Such a tool-info module typically has under 50 lines of Python code, and needs to be written only once per tool.…”
Section: Benchmarking a Set Of Runsmentioning
confidence: 99%
See 2 more Smart Citations
“…-a set of input files, -the name of the tool to use, -command-line arguments for the tool (e.g., to specify a tool configuration), -the limits for CPU time, memory, and number of CPU cores, and -the number of runs that should be executed in parallel. This benchmark definition is given in XML format; an example is available in the tool documentation 25 and in Listing 2 of Appendix B. Additionally, a tool-info module (a tool-specific Python module) needs to be written that con-tains functions for creating a command-line string for a run (including input file and user-defined command-line arguments) and for determining the result from the exit code and the output of the tool. Such a tool-info module typically has under 50 lines of Python code, and needs to be written only once per tool.…”
Section: Benchmarking a Set Of Runsmentioning
confidence: 99%
“…The authors of DataMill [25] propose to make benchmarking more reliable by explicitly varying as many hardware and software factors as possible in a controlled manner while benchmarking, e.g., the hardware architecture, CPU model, memory size, link order, etc. To do so, they rely on a diverse set of worker machines, which are rebooted for each benchmark run into a specific OS installation.…”
Section: Benchmarking Strategiesmentioning
confidence: 99%
See 1 more Smart Citation
“…This facilitated the communication and ensured transparency. To run the experiments, we used DataMill [30], to ensure robust and reproducible experiments. We selected the most powerful and generalpurpose machine and evaluated all submissions on this machine.…”
Section: The Early Years: 2014-2016mentioning
confidence: 99%
“…Many different competitions exist in which tools are evaluated in terms of performance (evaluation of their usability is rare). Experiments are typically conducted on a representative set of benchmark problems and executed by benchmarking environments like BenchExec [16], BenchKit [17], DataMill [18], or StarExec [19]. The oldest competitions concern Boolean satisfiability (SAT) solvers [20], initiated three decades ago, and Automated Theorem Provers (ATP) [21].…”
mentioning
confidence: 99%