Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval 2011
DOI: 10.1145/2009916.2010055
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating diversified search results using per-intent graded relevance

Abstract: Search queries are often ambiguous and/or underspecified. To accomodate different user needs, search result diversification has received attention in the past few years. Accordingly, several new metrics for evaluating diversification have been proposed, but their properties are little understood. We compare the properties of existing metrics given the premises that (1) queries may have multiple intents; (2) the likelihood of each intent given a query is available; and (3) graded relevance assessments are avail… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
97
0

Year Published

2011
2011
2019
2019

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 99 publications
(99 citation statements)
references
References 24 publications
2
97
0
Order By: Relevance
“…This simple approach has several drawbacks: the IA metric as defined above does not fully range between 0-1; in general it does not necessarily encourage diversity relative to relevance [9,24]; and it has limited discriminative power, i.e. the ability to draw reliable conclusions from statistical tests in an experiment [24,25].…”
Section: Diversity Evaluation Metricsmentioning
confidence: 99%
See 2 more Smart Citations
“…This simple approach has several drawbacks: the IA metric as defined above does not fully range between 0-1; in general it does not necessarily encourage diversity relative to relevance [9,24]; and it has limited discriminative power, i.e. the ability to draw reliable conclusions from statistical tests in an experiment [24,25].…”
Section: Diversity Evaluation Metricsmentioning
confidence: 99%
“…Pr (i|q) = 1/|{i}| for any i) and graded relevance assessments are not utilised, the NTCIR INTENT task utilises these types of information by leveraging the "D " evaluation framework of Sakai and Song [24]. More specifically, a diversity version of normalised discounted cumulative gain (nDCG) [13] called D-nDCG is computed, based on the global gain which consolidates perintent graded relevance assessments and intent probabilities for each document.…”
Section: Diversity Evaluation Metricsmentioning
confidence: 99%
See 1 more Smart Citation
“…Both ERR-IA and a-nDCG have been shown to reward rankings that achieve a balance of coverage and novelty (Clarke et al 2011). Moreover, a-nDCG has been shown to possess a discriminative power at least as high as that of the traditional nDCG (Sakai and Song 2011). Following the standard TREC setting, unless otherwise noted, both metrics are reported at rank cutoff 20 (Clarke et al 2010).…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…As a result, their approach effectively determines when and how to diversify the results for an unseen query. Sakai et al [14,15] proposed an alternative way to evaluate diversified search results, given intent probabilities and per-intent graded relevance assessments.…”
Section: Introductionmentioning
confidence: 99%