Evaluation and Assessment in Software Engineering 2021
DOI: 10.1145/3463274.3463361
|View full text |Cite
|
Sign up to set email alerts
|

Benchmarking as Empirical Standard in Software Engineering Research

Abstract: In empirical software engineering, benchmarks can be used for comparing different methods, techniques and tools. However, the recent ACM SIGSOFT Empirical Standards for Software Engineering Research do not include an explicit checklist for benchmarking. In this paper, we discuss benchmarks for software performance and scalability evaluation as example research areas in software engineering, relate benchmarks to some other empirical research methods, and discuss the requirements on benchmarks that may constitut… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(16 citation statements)
references
References 36 publications
(46 reference statements)
0
14
0
Order By: Relevance
“…The recently published ACM SIGSOFT Empirical Standard for Benchmarking (Ralph et al 2021;Hasselbring 2021) names four essential components of a benchmark: 2 -the quality to be benchmarked (e.g., performance, availability, scalability, security) -the metric(s) to quantify the quality -the measurement method(s) for the metric (if not obvious) -the workload, usage profile and/or task sample the system under test is subject to (i.e., what the system is doing when the measures are taken)…”
Section: Components Of Benchmarksmentioning
confidence: 99%
See 1 more Smart Citation
“…The recently published ACM SIGSOFT Empirical Standard for Benchmarking (Ralph et al 2021;Hasselbring 2021) names four essential components of a benchmark: 2 -the quality to be benchmarked (e.g., performance, availability, scalability, security) -the metric(s) to quantify the quality -the measurement method(s) for the metric (if not obvious) -the workload, usage profile and/or task sample the system under test is subject to (i.e., what the system is doing when the measures are taken)…”
Section: Components Of Benchmarksmentioning
confidence: 99%
“…In empirical software engineering research, benchmarks are an established research method to compare different methods, techniques, and tools based on a standardized method (Sim et al 2003;Tichy 2014;Hasselbring 2021). For traditional performance attributes such as latency or throughput, well-known (and often straightforward) metrics and measurement methods exists (Kounev et al 2020).…”
Section: Introductionmentioning
confidence: 99%
“…Based on guidelines on benchmarking best-practices [29,78] and inspired by the microservice benchmark suite DeathStar-Bench [26], we formulate the following design principles:…”
Section: Design Principlesmentioning
confidence: 99%
“…We conduct a performance benchmarking experiment [29] with an open-loop load generator in the data center region Northern Virginia (us-east-1) as commonly used by other serverless studies [10,14,16,81,82]. We collected over 7.5 million traces, through over 12 months of experimentation in 2021 and 2022.…”
Section: Experiments Designmentioning
confidence: 99%
“…While there might be indications that one of the models is better, how can you know for sure? We need a reliable approach for benchmarking different models [19], i.e., test automation that helps us detect if any GDMs digress from acceptable behavior.…”
mentioning
confidence: 99%