Efficient construction of large test collections

Cormack, Gordon V.; Palmer, C.; Clarke, Charles L. A.

doi:10.1145/290941.291009

Cited by 159 publications

(102 citation statements)

References 8 publications

Supporting

Mentioning

102

Contrasting

Order By: Relevance

“…Several alternative approaches to the original pooling method have been suggested in order to judge more relevant documents at the same pool depth, e.g. Zobel [21] and Cormack et al [7].…”

Section: Related Workmentioning

confidence: 99%

An uncertainty-aware query selection model for evaluation of IR systems

Hosseini

Cox

Milić-Frayling

et al. 2012

Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

We propose a mathematical framework for query selection as a mechanism for reducing the cost of constructing information retrieval test collections. In particular, our mathematical formulation explicitly models the uncertainty in the retrieval effectiveness metrics that is introduced by the absence of relevance judgments. Since the optimization problem is computationally intractable, we devise an adaptive query selection algorithm, referred to as Adaptive, that provides an approximate solution. Adaptive selects queries iteratively and assumes that no relevance judgments are available for the query under consideration. Once a query is selected, the associated relevance assessments are acquired and then used to aid the selection of subsequent queries. We demonstrate the effectiveness of the algorithm on two TREC test collections as well as a test collection of an online search engine with 1000 queries. Our experimental results show that the queries chosen by Adaptive produce reliable performance ranking of systems. The ranking is better correlated with the actual systems ranking than the rankings produced by queries that were selected using the considered baseline methods.

show abstract

“…Several alternative approaches to the original pooling method have been suggested in order to judge more relevant documents at the same pool depth, e.g. Zobel [21] and Cormack et al [7].…”

Section: Related Workmentioning

confidence: 99%

An uncertainty-aware query selection model for evaluation of IR systems

Hosseini

Cox

Milić-Frayling

et al. 2012

Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

show abstract

“…Cormack et al [10] proposed two techniques: iterative searching and judging (ISJ) and move-to-front pooling. With ISJ, relevance assessors perform multiple searches while judging documents for relevance, in order to try and recover as many relevant documents as possible.…”

Section: Related Workmentioning

confidence: 99%

“…Several methods have been shown to locate most relevant documents or to estimate conventional measures using a fraction of the currently judged documents; an assessment regime could apply these techniques within the current pooling "budget" and explore a much deeper pool. One such method that we have examined is move-to-front pooling [10]. If we judge the number of documents that would have been judged in a depth-50 pool, but using the move-to-front approach, we would recover 79% of the relevant documents found in the official pool while only judging 48% of the officially-judged nonrelevant documents.…”

Section: Toward Large Reusable Test Collectionsmentioning

confidence: 99%

Bias and the limits of pooling for large collections

Buckley¹,

Dimmick

Soboroff

et al. 2007

Inf Retrieval

View full text Add to dashboard Cite

Modern retrieval test collections are built through a process called pooling in which only a sample of the entire document set is judged for each topic. The idea behind pooling is to find enough relevant documents such that when unjudged documents are assumed to be nonrelevant the resulting judgment set is sufficiently complete and unbiased. Yet a constant-size pool represents an increasingly small percentage of the document set as document sets grow larger, and at some point the assumption of approximately complete judgments must become invalid. This paper shows that the judgment sets produced by traditional pooling when the pools are too small relative to the total document set size can be biased in that they favor relevant documents that contain topic title words. This phenomenon is wholly dependent on the collection size and does not depend on the number of relevant documents for a given topic. We show that the AQUAINT test collection constructed in the recent TREC 2005 workshop exhibits this biased relevance set; it is likely that the test collections based on the much larger GOV2 document set also exhibit the bias. The paper concludes with suggested modifications to traditional pooling and evaluation methodology that may allow very large reusable test collections to be built.

show abstract

“…The effects of incomplete relevance assessments, imperfect judgements, potential biases in the relevance pool and the effects of assessor domain expertise in relation to the topic have been investigated in various studies (Cuadra, 1967;Zobel, 1998;Buckley and Voorhees, 2004;Yilmaz and Aslam, 2006;Büttcher et al, 2007;Bailey et al, 2008;Kinney et al, 2008). Approaches to ensure completeness of relevance assessments include using the results from searches conducted manually to generate the pools and supplementing pools with relevant documents found by manually searching the document collection with an IR system, known as Interactive Search and Judge or ISJ (Cormack et al, 1998) Generating relevance assessment is often highly timeconsuming and labour intensive. This often leads to a bottleneck in the creation of test collections.…”

Section: Relevance Assessmentsmentioning

confidence: 99%