DataMill: a distributed heterogeneous infrastructure forrobust experimentation

Petkovich, Jean-Christophe; Oliveira, Augusto Born de; Zhang, Y.; Reidemeister, Thomas; Fischmeister, Sebastian

doi:10.1002/spe.2382

Cited by 6 publications

(6 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…-a set of input files, -the name of the tool to use, -command-line arguments for the tool (e.g., to specify a tool configuration), -the limits for CPU time, memory, and number of CPU cores, and -the number of runs that should be executed in parallel. This benchmark definition is given in XML format; an example is available in the tool documentation 25 and in Listing 2 of Appendix B. Additionally, a tool-info module (a tool-specific Python module) needs to be written that con-tains functions for creating a command-line string for a run (including input file and user-defined command-line arguments) and for determining the result from the exit code and the output of the tool. Such a tool-info module typically has under 50 lines of Python code, and needs to be written only once per tool.…”

Section: Benchmarking a Set Of Runsmentioning

confidence: 99%

“…The authors of DataMill [25] propose to make benchmarking more reliable by explicitly varying as many hardware and software factors as possible in a controlled manner while benchmarking, e.g., the hardware architecture, CPU model, memory size, link order, etc. To do so, they rely on a diverse set of worker machines, which are rebooted for each benchmark run into a specific OS installation.…”

Section: Benchmarking Strategiesmentioning

confidence: 99%

“…https://github.com/sosy-lab/benchexec/blob/master/doc/ run-results.md25 https://github.com/sosy-lab/benchexec/blob/master/doc/ benchexec.md…”

mentioning

confidence: 99%

See 2 more Smart Citations

Reliable benchmarking: requirements and solutions

Beyer

Löwe

Wendler

2017

Int J Softw Tools Technol Transfer

122

View full text Add to dashboard Cite

Benchmarking is a widely used method in experimental computer science, in particular, for the comparative evaluation of tools and algorithms. As a consequence, a number of questions need to be answered in order to ensure proper benchmarking, resource measurement, and presentation of results, all of which is essential for researchers, tool developers, and users, as well as for tool competitions. We identify a set of requirements that are indispensable for reliable benchmarking and resource measurement of time and memory usage of automatic solvers, verifiers, and similar tools, and discuss limitations of existing methods and benchmarking tools. Fulfilling these requirements in a benchmarking framework can (on Linux systems) currently only be done by using the cgroup and namespace features of the kernel. We developed BenchExec, a ready-to-use, tool-independent, and open-source implementation of a benchmarking framework that fulfills all presented requirements, making reliable benchmarking and resource measurement easy. Our framework is able to work with a wide range of different tools, has proven its reliability and usefulness in the International Competition on Software Verification, and is used by several research groups worldwide to ensure reliable benchmarking. Finally, we present guidelines on how to present measurement results in a scientifically valid and comprehensible way.

show abstract

Section: Benchmarking a Set Of Runsmentioning

confidence: 99%

Section: Benchmarking Strategiesmentioning

confidence: 99%

See 1 more Smart Citation

Reliable benchmarking: requirements and solutions

Beyer

Löwe

Wendler

2017

Int J Softw Tools Technol Transfer

122

View full text Add to dashboard Cite

show abstract

“…This facilitated the communication and ensured transparency. To run the experiments, we used DataMill [30], to ensure robust and reproducible experiments. We selected the most powerful and generalpurpose machine and evaluated all submissions on this machine.…”

Section: The Early Years: 2014-2016mentioning

confidence: 99%

International Competition on Runtime Verification (CRV)

Bartocci

Falcone

Reger

2019

Tools and Algorithms for the Construction and Analysis of Systems

View full text Add to dashboard Cite

show abstract

“…Many different competitions exist in which tools are evaluated in terms of performance (evaluation of their usability is rare). Experiments are typically conducted on a representative set of benchmark problems and executed by benchmarking environments like BenchExec [16], BenchKit [17], DataMill [18], or StarExec [19]. The oldest competitions concern Boolean satisfiability (SAT) solvers [20], initiated three decades ago, and Automated Theorem Provers (ATP) [21].…”

mentioning

confidence: 99%

Empirical Formal Methods: Guidelines for Performing Empirical Studies on Formal Methods

Beek

Ferrari

2022

Software

View full text Add to dashboard Cite

Empirical studies on formal methods and tools are rare. In this paper, we provide guidelines for such studies. We mention their main ingredients and then define nine different study strategies (usability testing, laboratory experiments with software and human subjects, case studies, qualitative studies, surveys, judgement studies, systematic literature reviews, and systematic mapping studies) and discuss for each of them their crucial characteristics, the difficulties of applying them to formal methods and tools, typical threats to validity, their maturity in formal methods, pointers to external guidelines, and pointers to studies in other fields. We conclude with a number of challenges for empirical formal methods.

show abstract

DataMill: a distributed heterogeneous infrastructure forrobust experimentation

Cited by 6 publications

References 26 publications

Reliable benchmarking: requirements and solutions

Reliable benchmarking: requirements and solutions

International Competition on Runtime Verification (CRV)

Empirical Formal Methods: Guidelines for Performing Empirical Studies on Formal Methods

Contact Info

Product

Resources

About