2021
DOI: 10.1002/widm.1441
|View full text |Cite
|
Sign up to set email alerts
|

Over‐optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results

Abstract: In recent years, the need for neutral benchmark studies that focus on the comparison of methods coming from computational sciences has been increasingly recognized by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, a certain flexibility always exists. This includes the choice of data sets and performance measures, the handling of missing performance values, and the way the performance values are aggregated over the data s… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
26
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 19 publications
(32 citation statements)
references
References 63 publications
(134 reference statements)
0
26
0
Order By: Relevance
“…There are many possible options of dealing with missing data, the most fundamental decision being whether missing data should be deleted or replaced by plausible values -a technique also known as imputation. If a researcher decides to impute missing data, they can choose from a plethora of imputation methods, all leading to slightly different replacement values that can influence the results of statistical hypothesis tests (Nießl et al, 2021).…”
Section: (10) Favorable Imputationmentioning
confidence: 99%
“…There are many possible options of dealing with missing data, the most fundamental decision being whether missing data should be deleted or replaced by plausible values -a technique also known as imputation. If a researcher decides to impute missing data, they can choose from a plethora of imputation methods, all leading to slightly different replacement values that can influence the results of statistical hypothesis tests (Nießl et al, 2021).…”
Section: (10) Favorable Imputationmentioning
confidence: 99%
“…Here, several approaches exist. Most commonly, the methods are ranked according to their performance and results are presented as summaries of this ranking, see Nießl et al (2021) for a detailed discussion. As pointed out by Boulesteix et al (2013), the concepts of meta-analysis could also be extended for the framework of method comparison studies.…”
Section: Discussionmentioning
confidence: 99%
“…Recently, it has been noted in the context of data analysis that there is a tendency to over-optimistic reporting of the performance of new methods and a lack of neutral comparison studies in the literature, see e.g. Boulesteix (2015); Boulesteix et al (2017Boulesteix et al ( , 2013; Van Mechelen et al (2018); Weber et al (2019); Buchka et al (2021); Nießl et al (2021); Pawel et al (2022). Neutral comparison studies, however, are essential to guarantee a fair comparison of existing methods across different scenarios, thus allowing an applied researcher to determine the best method for her or his situation.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The goal of this study is to provide an overview of existing approaches for encoding categorical predictor variables and to study their effect on a model's predictive performance. Following calls in the computational statistics community for neutral benchmark studies (Boulesteix et al 2017), which do not introduce a new method, thus reducing the risk of cherry picking methods (Dehghani et al 2021) and reporting over-optimistic performance (Nießl et al 2021), we present a carefully designed experimental setting to discern the effect of encoding strategies and their interaction with different ML algorithms.…”
Section: Introductionmentioning
confidence: 99%