ESANN 2021 Proceedings 2021
DOI: 10.14428/esann/2021.es2021-122
|View full text |Cite
|
Sign up to set email alerts
|

Judging competitions and benchmarks: a candidate election approach

Abstract: Machine learning progress relies on algorithm benchmarks. We study the problem of declaring a winner, or ranking "candidate" algorithms, based on results obtained by "judges" (scores on various tasks). Inspired by social science and game theory on fair elections, we compare various ranking functions, ranging from simple score averaging to Condorcet methods. We devise novel empirical criteria to assess the quality of ranking functions, including the generalization to new tasks and the stability under judge or c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 6 publications
(2 reference statements)
0
1
0
Order By: Relevance
“…We do not address here the process of generating a sound and reproducible ranking. However, many related problems have been investigated in the literature: the test set size needed to get good error rate estimations [4], score distributions for stochastic algorithms [10], the selection of the worst run to reduce chance in competitions [2], or the problem of fusing scores from multiple "judges" (multiple tasks and/or multiple metrics) [8].…”
Section: Related Problems and Related Workmentioning
confidence: 99%
“…We do not address here the process of generating a sound and reproducible ranking. However, many related problems have been investigated in the literature: the test set size needed to get good error rate estimations [4], score distributions for stochastic algorithms [10], the selection of the worst run to reduce chance in competitions [2], or the problem of fusing scores from multiple "judges" (multiple tasks and/or multiple metrics) [8].…”
Section: Related Problems and Related Workmentioning
confidence: 99%