Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence

Zhao, Ying; Scholer, Falk; Tsegay, Yohannes

doi:10.1007/978-3-540-78646-7_8

Cited by 111 publications

(183 citation statements)

References 13 publications

Supporting

Mentioning

180

Contrasting

Unclassified

Order By: Relevance

“…Later, Scholer et al [5] reported that using the maximum IDF value of any term in a query gives the best correlation on the TREC web data. These results were confirmed and extended to other TREC collections [7].…”

Section: Related Worksupporting

confidence: 77%

“…More recently, Zhao et al [7] presented two families of pre-retrieval predictors. The first is based on the similarity between a query and the overall document collection, the second focuses on the variability in how query terms are distributed across documents.…”

Section: Related Workmentioning

confidence: 99%

“…In previous work, two evaluation methodologies were used, comparing prediction scores with individual retrieval models (e.g. [7]) or with the average performance of several models (e.g. [2]).…”

Section: Experimental Settingsmentioning

confidence: 99%

See 2 more Smart Citations

Using a Medical Thesaurus to Predict Query Difficulty

Boudin

Nie

Dawes

2012

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Estimating query performance is the task of predicting the quality of results returned by a search engine in response to a query. In this paper, we focus on pre-retrieval prediction methods for the medical domain. We propose a novel predictor that exploits a thesaurus to ascertain how difficult queries are. In our experiments, we show that our predictor outperforms the state-of-the-art methods that do not use a thesaurus.

show abstract

Section: Related Worksupporting

confidence: 77%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Using a Medical Thesaurus to Predict Query Difficulty

Boudin

Nie

Dawes

2012

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…The features used to train this classifier included query performance estimators (Average Inverse Collection Term Frequency (AvICTF) [12], Simplified Clarity Score (SCS)) [11], the derivates of the similarity score between collection and query (SumSCQ, AvSCQ, MaxSCQ) [27], result set size and the un-normalised BM25 document scores for the top five documents. Their classifier achieved 78 % accuracy on the FAQ SMS training data using a leave-one-out validation.…”

Section: Related Workmentioning

confidence: 99%

“…Seven of these predictors were pre-retrieval predictors and these were : Average Pointwise Mutual Information (AvPMI) [10], Simplified Clarity Score (SCS) [11], Average Inverse Collection Term Frequency (AvICTF) [12], Average Inverse Document Frequency (AvIDF) [10] and the derivatives of the similarity score between collection and query (SumSCQ, AvSCQ, MaxSCQ) [27]. One post-retrieval predictor was used, the Clarity Score (CS) [5].…”

Section: Creating Training and Testing Instances For Missing Content mentioning

confidence: 99%

Detecting Missing Content Queries in an SMS-Based HIV/AIDS FAQ Retrieval System

Thuma

Rogers

Ounis

2014

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Automated Frequently Asked Question (FAQ) answering systems use pre-stored sets of question-answer pairs as an information source to answer natural language questions posed by the users. The main problem with this kind of information source is that there is no guarantee that there will be a relevant question-answer pair for all user queries. In this paper, we propose to deploy a binary classifier in an existing SMS-Based HIV/AIDS FAQ Retrieval System to detect user queries that do not have the relevant question-answer pair in the FAQ document collection. Before deploying such a classifier, we first evaluate different feature sets for training in order to determine the sets of features that can build a model that yields the best classification accuracy. We carry out our evaluation using seven different feature sets generated from a query log before and after retrieval by the FAQ retrieval system. Our results suggest that, combining different feature sets markedly improves the classification accuracy.

show abstract

On the relationship between query characteristics and IR functions retrieval bias

Bashir

Rauber

2011

J. Am. Soc. Inf. Sci.

View full text Add to dashboard Cite

Bias quantification of retrieval functions with the help of document retrievability scores has recently evolved as an important evaluation measure for recall-oriented retrieval applications. While numerous studies have evaluated retrieval bias of retrieval functions, solid validation of its impact on realistic types of queries is still limited. This is due to the lack of well-accepted criteria for query generation for estimating retrievability. Commonly, random queries are used for approximating documents retrievability due to the prohibitively large query space and time involved in processing all queries. Additionally, a cumulative retrievability score of documents over all queries is used for analyzing retrieval functions (retrieval) bias. However, this approach does not consider the difference between different query characteristics (QCs) and their influence on retrieval functions' bias quantification. This article provides an in-depth study of retrievability over different QCs. It analyzes the correlation of lower/higher retrieval bias with different query characteristics. The presence of strong correlation between retrieval bias and query characteristics in experiments indicates the possibility of determining retrieval bias of retrieval functions without processing an exhaustive query set. Experiments are validated onTREC Chemical Retrieval Track consisting of 1.2 million patent documents.

show abstract

Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence

Cited by 111 publications

References 13 publications

Using a Medical Thesaurus to Predict Query Difficulty

Using a Medical Thesaurus to Predict Query Difficulty

Detecting Missing Content Queries in an SMS-Based HIV/AIDS FAQ Retrieval System

On the relationship between query characteristics and IR functions retrieval bias

Contact Info

Product

Resources

About