2019
DOI: 10.1109/tkde.2018.2840708
|View full text |Cite
|
Sign up to set email alerts
|

A General Theory of IR Evaluation Measures

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
43
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 24 publications
(44 citation statements)
references
References 21 publications
1
43
0
Order By: Relevance
“…by using correlation analysis [119], discriminative power [86,87], or robustness to pool downsampling [16,124]. On the other hand, few studies have been undertaken to understand the formal properties of evaluation measures and they have just scratched the surface of the problem: [13,19,7,110,39,40,41].…”
Section: Experimental Evaluation In Irmentioning
confidence: 99%
See 1 more Smart Citation
“…by using correlation analysis [119], discriminative power [86,87], or robustness to pool downsampling [16,124]. On the other hand, few studies have been undertaken to understand the formal properties of evaluation measures and they have just scratched the surface of the problem: [13,19,7,110,39,40,41].…”
Section: Experimental Evaluation In Irmentioning
confidence: 99%
“…Scales have increasing properties: a nominal scale allows for determination of equality and for the computation of the mode; an ordinal scale allows only for determination of greater or less and for the computation of medians and percentiles; an interval scale allows also for determination of equality of intervals or differences and for the computation of mean, standard deviation, and rank-order correlation; finally, a ratio scale allows also for the determination of equality of ratios and for the computation of coefficient of variation. Recently, Ferrante et al [40,41] have theoretically shown that some of the most known and used IR measures, like Average Precision (AP) or Discounted Cumulative Gain (DCG), are not interval-scales. As a consequence, according to Stevens's prescriptions, we should neither compute means, standard deviations and confidence intervals, nor perform significance tests that require an interval scale.…”
Section: Introductionmentioning
confidence: 99%
“…This paper follows in the tradition of the so-called "axiomatic" approach to "evaluating evaluation" in information retrieval (see e.g., (Amigó et al 2011;Busin and Mizzaro 2013;Ferrante et al 2015Ferrante et al , 2018Moffat 2013;Sebastiani 2015)), which is based on describing (and often: arguing in favour of) a number of properties (that most of this literature calls -perhaps improperly -"axioms") that an evaluation measure for the task being considered should intuitively satisfy. The benefit of this approach is that it shifts the discussion from the evaluation measures to their properties, which amounts to shifting the discussion from a complex construction to its building blocks: once the scientific community has agreed on a set of properties (the building blocks), it then follows whether a given measure (the construction) is satisfactory or not.…”
Section: Introductionmentioning
confidence: 99%
“…Evaluation measures are an intrinsic part of experimental evaluation. Even if a growing attention is called in the field for developing stronger theoretical foundations [1][2][3]9,12], they are often formulated and justified in a somewhat informal and intuitive way rather then being based on well-founded mathematical models. Carterette [4] has made a post-hoc attempt to propose a unifying framework which explains modern evaluation measures based on three components: a browsing model, a model of document utility, and a utility accumulation model.…”
Section: Introductionmentioning
confidence: 99%