Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1998
DOI: 10.1145/290941.290974
|View full text |Cite
|
Sign up to set email alerts
|

Effective retrieval with distributed collections

Abstract: This paper evaluates the retrieval effectiveness of distributed information retrieval systems in realistic environments.We find that when a large number of collections are available, the retrieval effectiveness is significantly worse than that of centralized systems, mainly because typical queries are not adequate for the purpose of choosing the right collections. We propose two techniques to address the problem. One is to use phrase information in the collection selection index and the other is query expansio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
98
3

Year Published

2000
2000
2001
2001

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 125 publications
(104 citation statements)
references
References 10 publications
(1 reference statement)
3
98
3
Order By: Relevance
“…Finally, those result-lists are merged into a single list of documents to be presented to a user. A number of different approaches for database or collection selection have been proposed and individually evaluated [4,10,11,12,13,15,17,22,25]. Three of these approaches, CORI [4], CVV [25] and gGlOSS [11,12] were evaluated in a common environment by French, et al [3,7,8], who found that there was significant room for improvement in all approaches, especially when very few databases were selected.…”
Section: Distributed Retrieval Database Selection and Results Mergingmentioning
confidence: 99%
See 2 more Smart Citations
“…Finally, those result-lists are merged into a single list of documents to be presented to a user. A number of different approaches for database or collection selection have been proposed and individually evaluated [4,10,11,12,13,15,17,22,25]. Three of these approaches, CORI [4], CVV [25] and gGlOSS [11,12] were evaluated in a common environment by French, et al [3,7,8], who found that there was significant room for improvement in all approaches, especially when very few databases were selected.…”
Section: Distributed Retrieval Database Selection and Results Mergingmentioning
confidence: 99%
“…Xu and Callan [22] showed that poor database selection performance hindered distributed retrieval performance, and investigated the use of query expansion and phrases in database selection. Viles and French [9,19] showed that dissemination of collection information increased retrieval effectiveness.…”
Section: Distributed Retrieval Database Selection and Results Mergingmentioning
confidence: 99%
See 1 more Smart Citation
“…One representative example was the testbed created for TREC-5 ( Harman, 1997), in which data on TREC CDs 2 and 4 was partitioned into 98 databases, each about 20 megabytes in size. Testbeds of about 100 databases each w ere also created based on TREC CD's 1 and 2 (Xu and Callan, 1998), TREC CD's 2 and 3 (Lu et al, 1996a Xu and, and TREC CD's 1, 2, and 3 (French et al, 1999Callan, 1999a. A testbed of 921 databases was created by dividing the 20 gigabyte TREC Very Large Corpus (VLC) data into smaller databases (Callan, 1999cFrench et al, 1999.…”
Section: Multi-database Testbedsmentioning
confidence: 99%
“…This task can be di cult because the document rankings and scores produced by e a c h database are based on di erent corpus statistics and possibly di erent representations and/or retrieval algorithms they usually cannot be compared directly. Solutions include computing normalized scores (Kwok et al, 1995Viles and French, 1995Kirsch, 1997Xu and Callan, 1998, estimating normalized scores (Callan et al, 1995bLu et al, 1996a, and merging based on unnormalized scores (Dumais, 1994).…”
Section: Merging Document Rankingsmentioning
confidence: 99%