Evaluating diversified search results using per-intent graded relevance

Sakai, Tetsuya; Song, Ruihua

doi:10.1145/2009916.2010055

Cited by 99 publications

(99 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This simple approach has several drawbacks: the IA metric as defined above does not fully range between 0-1; in general it does not necessarily encourage diversity relative to relevance [9,24]; and it has limited discriminative power, i.e. the ability to draw reliable conclusions from statistical tests in an experiment [24,25].…”

Section: Diversity Evaluation Metricsmentioning

confidence: 99%

“…Pr (i|q) = 1/|{i}| for any i) and graded relevance assessments are not utilised, the NTCIR INTENT task utilises these types of information by leveraging the "D " evaluation framework of Sakai and Song [24]. More specifically, a diversity version of normalised discounted cumulative gain (nDCG) [13] called D-nDCG is computed, based on the global gain which consolidates perintent graded relevance assessments and intent probabilities for each document.…”

Section: Diversity Evaluation Metricsmentioning

confidence: 99%

“…It has been shown, in several experimental settings using NT-CIREVAL, that D( )-nDCG outperforms ERR-IA in terms of discriminative power and/or concordance tests, which reflect how often diversity metrics agree with simpler and more "intuitive" metrics such as precision and intent recall [24,25,17]. Sanderson et al [26] have tried to study how diversity metrics agree with user preferences, but their study treated each intent (i.e.…”

Section: Diversity Evaluation Metricsmentioning

confidence: 99%

See 2 more Smart Citations

The Reusability of a Diversified Search Test Collection

Sakai

Dou

Song

et al. 2012

Information Retrieval Technology

Self Cite

View full text Add to dashboard Cite

Traditional "ad hoc" test collections, typically built based on depth-100 pools, are often used a posteriori by non-contributors, i.e., research groups that did not contribute the pools. The Leave One Out (LOO) test is useful for testing whether the test collections are actually reusable: that is, whether the non-contributors can be evaluated fairly relative to the contributors' official performances. In contrast, at the recent web search result diversification tasks of TREC and NTCIR, diversity test collections have been built using shallow pools: the pool depths lie between 20 and 40. Thus it is unlikely that these diversity test collections are reusable: in fact, the organisers of these diversity tasks never claimed that they are. Nevertheless, these collections are also used a posteriori by non-contributors. In light of this, Sakai et al. [21] demonstrated by means of LOO tests that the NTCIR-9 INTENT-1 Chinese diversity test collection is not reusable, and also showed that condensed-list evaluation metrics generally provide better estimates of the noncontributors' true performances than raw evaluation metrics. This paper generalises and strengthens their findings through LOO tests with the latest TREC 2012 diversity test collection.

show abstract

Section: Diversity Evaluation Metricsmentioning

confidence: 99%

Section: Diversity Evaluation Metricsmentioning

confidence: 99%

Section: Diversity Evaluation Metricsmentioning

confidence: 99%

See 1 more Smart Citation

The Reusability of a Diversified Search Test Collection

Sakai

Dou

Song

et al. 2012

Information Retrieval Technology

Self Cite

View full text Add to dashboard Cite

show abstract

“…Both ERR-IA and a-nDCG have been shown to reward rankings that achieve a balance of coverage and novelty (Clarke et al 2011). Moreover, a-nDCG has been shown to possess a discriminative power at least as high as that of the traditional nDCG (Sakai and Song 2011). Following the standard TREC setting, unless otherwise noted, both metrics are reported at rank cutoff 20 (Clarke et al 2010).…”

Section: Evaluation Metricsmentioning

confidence: 99%

On the role of novelty for search result diversification

2011

View full text Add to dashboard Cite

Re-ranking the search results in order to promote novel ones has traditionally been regarded as an intuitive diversification strategy. In this paper, we challenge this common intuition and thoroughly investigate the actual role of novelty for search result diversification, based upon the framework provided by the diversity task of the TREC 2009 and 2010 Web tracks. Our results show that existing diversification approaches based solely on novelty cannot consistently improve over a standard, non-diversified baseline ranking. Moreover, when deployed as an additional component by the current state-of-theart diversification approaches, our results show that novelty does not bring significant improvements, while adding considerable efficiency overheads. Finally, through a comprehensive analysis with simulated rankings of various quality, we demonstrate that, although inherently limited by the performance of the initial ranking, novelty plays a role at breaking the tie between similarly diverse results.

show abstract

“…As a result, their approach effectively determines when and how to diversify the results for an unseen query. Sakai et al [14,15] proposed an alternative way to evaluate diversified search results, given intent probabilities and per-intent graded relevance assessments.…”

Section: Introductionmentioning

confidence: 99%

mNIR: Diversifying Search Results Based on a Mixture of Novelty, Intention and Relevance

Hemayati

Dehkordi

Meng

2012

Web Information Systems Engineering - WISE 2012

View full text Add to dashboard Cite

ABSTRACT. Current search engines do not explicitly take different meanings and usages of user queries into consideration when they rank the search results. As a result, they tend to retrieve results that cover the most popular meanings or usages of the query. Consequently, users who want results that cover a rare meaning or usage of query or results that cover all different meanings/usages may have to go through a large number of results in order to find the desired ones. Another problem with current search engines is that they do not adequately take users' intention into consideration. In this paper, we introduce a novel result ranking algorithm (mNIR) that explicitly takes result novelty, user intention-based distribution and result relevancy into consideration and mixes them to achieve better result ranking. We analyze how giving different emphasis to the above three aspects would impact the overall ranking of the results. Our approach builds on our previous method for identifying and ranking possible categories of any user query based on the meanings and usages of the terms and phrases within the query. These categories are also used to generate category queries for retrieving results matching different meanings/usages of the original user query. Our experimental results show that the proposed algorithm can outperform state-of-the-art diversification approaches.

show abstract

Evaluating diversified search results using per-intent graded relevance

Cited by 99 publications

References 24 publications

The Reusability of a Diversified Search Test Collection

The Reusability of a Diversified Search Test Collection

On the role of novelty for search result diversification

mNIR: Diversifying Search Results Based on a Mixture of Novelty, Intention and Relevance

Contact Info

Product

Resources

About