Utilizing passage-based language models for ad hoc document retrieval

Bendersky, Michael; Kurland, Oren

doi:10.1007/s10791-009-9118-8

Cited by 10 publications

(18 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The line of work most related to ours is on passage-based document retrieval [3,4,7,13,21,24,25,28,30,32,41,44,[53][54][55]. As already noted, the most commonly used passage-based document retrieval methods are ranking a document by the maximum querysimilarity of its passages [4,7,24,25,32,44,55] and by interpolating this similarity with the document-query similarity [4,7,44,55]. We show that our best-performing methods substantially outperform a highly effective method that integrates document-query and passage-query similarities [4].…”

Section: Related Workmentioning

confidence: 99%

“…As a result, there has been a large body of work on passage-based document retrieval: utilizing information induced from document passages to rank the documents; e.g., [4,7,25,32,55]. The most commonly used passage-based document retrieval methods rank a document by the highest query similarity exhibited by any of its passages [4,7,25,32,55] and by integrating this similarity with the document-query similarity [4,7,55].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A passage-based approach to learning to rank documents

Sheetrit

Shtok²,

Kurland

2020

Inf Retrieval J

Self Cite

View full text Add to dashboard Cite

According to common relevance-judgments regimes, such as TREC's, a document can be deemed relevant to a query even if it contains a very short passage of text with pertinent information. This fact has motivated work on passage-based document retrieval: document ranking methods that induce information from the document's passages. However, the main source of passage-based information utilized was passage-query similarities. We address the challenge of utilizing richer sources of passage-based information to improve document retrieval effectiveness. Specifically, we devise a suite of learning-to-rank-based document retrieval methods that utilize an effective ranking of passages produced in response to the query; the passage ranking is also induced using a learning-to-rank approach. Some of the methods quantify the ranking of the passages of a document. Others utilize the feature-based representation of passages used for learning a passage ranker. Empirical evaluation attests to the clear merits of our methods with respect to highly effective baselines. Our best performing method is based on learning a document ranking function using document-query features and passage-query features of the document's passage most highly ranked.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A passage-based approach to learning to rank documents

Sheetrit

Shtok²,

Kurland

2020

Inf Retrieval J

Self Cite

View full text Add to dashboard Cite

show abstract

“…Expansion terms are selected from hand-crafted thesauri such as WordNet [10], co-occurrence based similarity thesauri [15], highly-ranked retrieved documents (i.e., pseudorelevance feedback) [23,43], highly-ranked retrieved passages [2,26], or external collections such as the Web or Wikipedia [9,42]. Document expansion has a similar motivation as query expansion, but expansion is applied to documents and not to the query [24,21].…”

Section: Monolingual Retrievalmentioning

confidence: 99%

“…To deal with these problems, many studies in IR have investigated word sense disambiguation (WSD) on queries and documents [20,40,34,35,13,29,36,16], or have performed query expansion [15,23,43,9,10,42,2,26] or document expansion [3,24,21] by appending semantically related words into the original query or document. Some of these approaches (e.g., pseudo-relevance feedback) have shown marked improvements in retrieval performance.…”

Section: Introductionmentioning

confidence: 99%

Enriching document representation via translation for improved monolingual information retrieval

2011

Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

Word ambiguity and vocabulary mismatch are critical problems in information retrieval. To deal with these problems, this paper proposes the use of translated words to enrich document representation, going beyond the words in the original source language to represent a document. In our approach, each original document is automatically translated into an auxiliary language, and the resulting translated document serves as a semantically enhanced representation for supplementing the original bag of words. The core of our translation representation is the expected term frequency of a word in a translated document, which is calculated by averaging the term frequencies over all possible translations, rather than focusing on the 1-best translation only. To achieve better efficiency of translation, we do not rely on full-fledged machine translation, but instead use monotonic translation by removing the time-consuming reordering component. Experiments carried out on standard TREC test collections show that our proposed translation representation leads to statistically significant improvements over using only the original language of the document collection.

show abstract

“…One of the effective approaches is passage retrieval, in which the relevance score of a document is boosted by an additional score estimated using passage-level evidence. Passage retrieval has turned out to significantly improve the baseline using only traditional document-level evidence (Callan 1994;Kaszkiel and Zobel 1997;Kaszkiel and Zobel 2001;Salton et al 1993;Na et al 2008b;Bendersky and Kurland 2010).…”

Section: Introductionmentioning

confidence: 99%

Utilizing local evidence for blog feed search

Lee

2011

Inf Retrieval

View full text Add to dashboard Cite

Blog feed search aims to identify a blog feed of recurring interest to users on a given topic. A blog feed, the retrieval unit for blog feed search, comprises blog posts of diverse topics. This topical diversity of blog feeds often causes performance deterioration of blog feed search. To alleviate the problem, this paper proposes several approaches based on passage retrieval, widely regarded as effective to handle topical diversity at document level in ad-hoc retrieval. We define the global and local evidence for blog feed search, which correspond to the document-level and passage-level evidence for passage retrieval, respectively, and investigate their influence on blog feed search, in terms of both initial retrieval and pseudo-relevance feedback. For initial retrieval, we propose a retrieval framework to integrate global evidence with local evidence. For pseudo-relevance feedback, we gather feedback information from the local evidence of the top K ranked blog feeds to capture diverse and accurate information related to a given topic. Experimental results show that our approaches using local evidence consistently and significantly outperform traditional ones.

show abstract

Utilizing passage-based language models for ad hoc document retrieval

Cited by 10 publications

References 50 publications

A passage-based approach to learning to rank documents

A passage-based approach to learning to rank documents

Enriching document representation via translation for improved monolingual information retrieval

Utilizing local evidence for blog feed search

Contact Info

Product

Resources

About