Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1999
DOI: 10.1145/312624.312687
|View full text |Cite
|
Sign up to set email alerts
|

Cluster-based language models for distributed retrieval

Abstract: E ective retrieval in a distributed environment is an important but di cult problem. Lack of e ectiveness appears to have three causes. First, collection selection based on word histograms is not appropriate for heterogeneous collections. Second, relevant documents are scattered over many collections and searching a few collections misses many relevant documents. Third, most existing collection selection metrics lack sound theoretical justi cations and hence may not bewell tuned to the problem. We propose a ne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
227
0
2

Year Published

2000
2000
2018
2018

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 211 publications
(229 citation statements)
references
References 16 publications
0
227
0
2
Order By: Relevance
“…Viles and French [9,19] showed that dissemination of collection information increased retrieval effectiveness. Xu and Croft [23] explored cluster-based language models, investigating different ways to construct database selection indexes.…”
Section: Distributed Retrieval Database Selection and Results Mergingmentioning
confidence: 99%
See 1 more Smart Citation
“…Viles and French [9,19] showed that dissemination of collection information increased retrieval effectiveness. Xu and Croft [23] explored cluster-based language models, investigating different ways to construct database selection indexes.…”
Section: Distributed Retrieval Database Selection and Results Mergingmentioning
confidence: 99%
“…However, in recent work, Xu and Croft [23] discuss the possibility that retrieval performance in a distributed environment may exceed performance in a centralized environment. In that work, Xu and Croft were pessimistic about the potential to achieve both retrieval efficiency and effectiveness in heterogeneous distributed environments.…”
Section: Introductionmentioning
confidence: 99%
“…While methods just merging collection documents to get aggregated statistics are among earliest known (Zobel, 1997;Xu and Croft, 1999), the most popular approach uses a task-specific tfidf approach. Document frequency in the collection is used instead of the sum of term frequencies, and inverted document frequency is approximated by inverted collection frequency (Callan et al, 1995).…”
Section: Resource Selectionmentioning
confidence: 99%
“…When applied to the problem of measuring the distance between two term distributions as in Language Modeling [23], KL estimates the relative entropy between the probability of a term t occurring in the actual collection Θ c (i.e. p(t|Θ c )), and the probability of the term t occurring in the estimated Topic Language Model Θ d (i.e.…”
Section: Document Expansion From Webmentioning
confidence: 99%
“…A nonzero constant α is introduced to alleviate the zero probability [23]. The smaller the KL divergence the closer the document is to the actual collection.…”
Section: Document Expansion From Webmentioning
confidence: 99%