Cluster-based language models for distributed retrieval

Xu, Jinxi; Croft, W. Bruce

doi:10.1145/312624.312687

Cited by 211 publications

(229 citation statements)

References 16 publications

Supporting

Mentioning

227

Contrasting

Unclassified

Order By: Relevance

“…Viles and French [9,19] showed that dissemination of collection information increased retrieval effectiveness. Xu and Croft [23] explored cluster-based language models, investigating different ways to construct database selection indexes.…”

Section: Distributed Retrieval Database Selection and Results Mergingmentioning

confidence: 99%

See 1 more Smart Citation

The impact of database selection on distributed searching

Powell

French

Callan

et al. 2000

Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

101

View full text Add to dashboard Cite

The proliferation of online information resources increases the importance of effective and efficient distributed searching. Distributed searching is cast in three parts -database selection, query processing, and results merging. In this paper we examine the effect of database selection on retrieval performance. We look at retrieval performance in three different distributed retrieval testbeds and distill some general results. First we find that good database selection can result in better retrieval effectiveness than can be achieved in a centralized database. Second we find that good performance can be achieved when only a few sites are selected and that the performance generally increases as more sites are selected. Finally we find that when database selection is employed, it is not necessary to maintain collection wide information (CWI), e.g. global idf. Local information can be used to achieve superior performance. This means that distributed systems can be engineered with more autonomy and less cooperation. This work suggests that improvements in database selection can lead to broader improvements in retrieval performance, even in centralized (i.e. single database) systems. Given a centralized database and a good selection mechanism, retrieval performance can be improved by decomposing that database conceptually and employing a selection step.

show abstract

Section: Distributed Retrieval Database Selection and Results Mergingmentioning

confidence: 99%

“…However, in recent work, Xu and Croft [23] discuss the possibility that retrieval performance in a distributed environment may exceed performance in a centralized environment. In that work, Xu and Croft were pessimistic about the potential to achieve both retrieval efficiency and effectiveness in heterogeneous distributed environments.…”

Section: Introductionmentioning

confidence: 99%

The impact of database selection on distributed searching

Powell

French

Callan

et al. 2000

Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

101

View full text Add to dashboard Cite

show abstract

“…While methods just merging collection documents to get aggregated statistics are among earliest known (Zobel, 1997;Xu and Croft, 1999), the most popular approach uses a task-specific tfidf approach. Document frequency in the collection is used instead of the sum of term frequencies, and inverted document frequency is approximated by inverted collection frequency (Callan et al, 1995).…”

Section: Resource Selectionmentioning

confidence: 99%

Search for expertise : going beyond direct evidence

Serdyukov

View full text Add to dashboard Cite

“…When applied to the problem of measuring the distance between two term distributions as in Language Modeling [23], KL estimates the relative entropy between the probability of a term t occurring in the actual collection Θ c (i.e. p(t|Θ c )), and the probability of the term t occurring in the estimated Topic Language Model Θ d (i.e.…”

Section: Document Expansion From Webmentioning

confidence: 99%

“…A nonzero constant α is introduced to alleviate the zero probability [23]. The smaller the KL divergence the closer the document is to the actual collection.…”

Section: Document Expansion From Webmentioning

confidence: 99%

Investigation of the Effectiveness of Cross-Media Indexing

Yakici

Crestani

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Cross-media analysis and indexing leverages the individual potential of each indexing information provided by different modalities, such as speech, text and image, to improve the effectiveness of information retrieval and filtering in later stages. The process does not only constitute generating a merged representation of the digital content, such as MPEG-7, but also enriching it in order to help remedy the imprecision and noise introduced during the low-level analysis phases. It has been hypothesized that a system that combines different media descriptions of the same multi-modal audio-visual segment in a semantic space will perform better at retrieval and filtering time. In order to validate this hypothesis, we have developed a cross-media indexing system which utilises the Multiple Evidence approach by establishing links among the modality specific textual descriptions in order to depict topical similarity.

show abstract

Cluster-based language models for distributed retrieval

Cited by 211 publications

References 16 publications

The impact of database selection on distributed searching

The impact of database selection on distributed searching

Search for expertise : going beyond direct evidence

Investigation of the Effectiveness of Cross-Media Indexing

Contact Info

Product

Resources

About