Proceedings of the 30th ACM International Conference on Information &Amp; Knowledge Management 2021
DOI: 10.1145/3459637.3482428
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating Relevance Judgments with Pairwise Discriminative Power

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 25 publications
0
1
0
Order By: Relevance
“…In future work, we plan to extend our study to other IR metrics like bpref [11] and infAP [57] for KGC evaluation. Other factors over meta-evaluation on metrics will also be examined, such as pairwise discriminative power [16], user satisfaction [15,29,59], and diversity [4,40].…”
Section: Discussionmentioning
confidence: 99%
“…In future work, we plan to extend our study to other IR metrics like bpref [11] and infAP [57] for KGC evaluation. Other factors over meta-evaluation on metrics will also be examined, such as pairwise discriminative power [16], user satisfaction [15,29,59], and diversity [4,40].…”
Section: Discussionmentioning
confidence: 99%
“…Additionally, a comparison conducted by Yang et al (2018) reveals that preference judgment is more reliable than other paradigms. Chu et al (2021) propose a combined evaluation metric named pairwise discriminative power (PDP) to evaluate the quality of relevance judgment collections with both pair-wise signals and point-wise signals. A novel combined metric proposed by Arabzadeh et al (2023) is applicable for instant search rather than offline search.…”
Section: Relevance Judgmentmentioning
confidence: 99%
“…For example, to the best of our knowledge, there is no universal grading scheme (i.e., how many levels to use and what those levels mean) in point-wise relevance judgment (Xie et al, 2020). Different numerical scales will significantly affect evaluation performance in various scenarios (Chu et al, 2021), as they determine the granularity of judgment and the interpretation of each level, which hurts the reliability of point-wise relevance judgment in practice.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation