Oren Kurland scite author profile

Most previous work on the recently developed languagemodeling approach to information retrieval focuses on document-specific characteristics, and therefore does not take into account the structure of the surrounding corpus. We propose a novel algorithmic framework in which information provided by document-based language models is enhanced by the incorporation of information drawn from clusters of similar documents. Using this framework, we develop a suite of new algorithms. Even the simplest typically outperforms the standard language-modeling approach in precision and recall, and our new interpolation algorithm posts statistically significant improvements for both metrics over all three corpora tested.

show abstract

Predicting Query Performance by Query-Drift Estimation

Shtok

Kurland

Carmel

2009

166

View full text Add to dashboard Cite

Predicting query performance, that is, the effectiveness of a search performed in response to a query, is a highly important and challenging problem. Our novel approach to addressing this challenge is based on estimating the potential amount of query drift in the result list, i.e., the presence (and dominance) of aspects or topics not related to the query in top-retrieved documents. We argue that query-drift can potentially be estimated by measuring the diversity (e.g., standard deviation) of the retrieval scores of these documents. Empirical evaluation demonstrates the prediction effectiveness of our approach for several retrieval models. Specifically, the prediction success is better, over most tested TREC corpora, than that of state-of-the-art prediction methods.

show abstract

Query Expansion Using Word Embeddings

2016

View full text Add to dashboard Cite

Using statistical decision theory and relevance models for query-performance prediction

Shtok

Kurland

Carmel

2010

View full text Add to dashboard Cite

We present a novel framework for the query-performance prediction task. That is, estimating the effectiveness of a search performed in response to a query in lack of relevance judgments. Our approach is based on using statistical decision theory for estimating the utility that a document ranking provides with respect to an information need expressed by the query. To address the uncertainty in inferring the information need, we estimate utility by the expected similarity between the given ranking and those induced by relevance models; the impact of a relevance model is based on its presumed representativeness of the information need. Specific query-performance predictors instantiated from the framework substantially outperform state-of-the-art predictors over five TREC corpora.

show abstract

Utilizing Passage-Based Language Models for Document Retrieval

Bendersky

Kurland

View full text Add to dashboard Cite

Abstract. We show that several previously proposed passage-based document ranking principles, along with some new ones, can be derived from the same probabilistic model. We use language models to instantiate specific algorithms, and propose a passage language model that integrates information from the ambient document to an extent controlled by the estimated document homogeneity. Several document-homogeneity measures that we propose yield passage language models that are more effective than the standard passage model for basic document retrieval and for constructing and utilizing passage-based relevance models; the latter outperform a document-based relevance model. We also show that the homogeneity measures are effective means for integrating documentquery and passage-query similarity information for document retrieval.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Oren Kurland

Corpus structure, language models, and ad hoc information retrieval

Predicting Query Performance by Query-Drift Estimation

Query Expansion Using Word Embeddings

Using statistical decision theory and relevance models for query-performance prediction

Utilizing Passage-Based Language Models for Document Retrieval

Contact Info

Product

Resources

About