Effective retrieval with distributed collections

Xu, Jinxi; Callan, Jamie

doi:10.1145/290941.290974

Cited by 125 publications

(105 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, those result-lists are merged into a single list of documents to be presented to a user. A number of different approaches for database or collection selection have been proposed and individually evaluated [4,10,11,12,13,15,17,22,25]. Three of these approaches, CORI [4], CVV [25] and gGlOSS [11,12] were evaluated in a common environment by French, et al [3,7,8], who found that there was significant room for improvement in all approaches, especially when very few databases were selected.…”

Section: Distributed Retrieval Database Selection and Results Mergingmentioning

confidence: 99%

“…Xu and Callan [22] showed that poor database selection performance hindered distributed retrieval performance, and investigated the use of query expansion and phrases in database selection. Viles and French [9,19] showed that dissemination of collection information increased retrieval effectiveness.…”

Section: Distributed Retrieval Database Selection and Results Mergingmentioning

confidence: 99%

“…First, we utilized multiple testbeds with different distributions of relevant documents for our experiments. Second, whereas other efforts have fixed the number of databases selected [3,22,23], we study the impact of selecting more or fewer databases. We also consider the combination of both database selection and the dissemination of collectionwide information.…”

Section: Experimental Methodologymentioning

confidence: 99%

See 2 more Smart Citations

The impact of database selection on distributed searching

Powell

French

Callan

et al. 2000

Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Self Cite

101

View full text Add to dashboard Cite

The proliferation of online information resources increases the importance of effective and efficient distributed searching. Distributed searching is cast in three parts -database selection, query processing, and results merging. In this paper we examine the effect of database selection on retrieval performance. We look at retrieval performance in three different distributed retrieval testbeds and distill some general results. First we find that good database selection can result in better retrieval effectiveness than can be achieved in a centralized database. Second we find that good performance can be achieved when only a few sites are selected and that the performance generally increases as more sites are selected. Finally we find that when database selection is employed, it is not necessary to maintain collection wide information (CWI), e.g. global idf. Local information can be used to achieve superior performance. This means that distributed systems can be engineered with more autonomy and less cooperation. This work suggests that improvements in database selection can lead to broader improvements in retrieval performance, even in centralized (i.e. single database) systems. Given a centralized database and a good selection mechanism, retrieval performance can be improved by decomposing that database conceptually and employing a selection step.

show abstract

Section: Distributed Retrieval Database Selection and Results Mergingmentioning

confidence: 99%

Section: Distributed Retrieval Database Selection and Results Mergingmentioning

confidence: 99%

Section: Experimental Methodologymentioning

confidence: 99%

See 1 more Smart Citation

The impact of database selection on distributed searching

Powell

French

Callan

et al. 2000

Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Self Cite

101

View full text Add to dashboard Cite

show abstract

“…One representative example was the testbed created for TREC-5 ( Harman, 1997), in which data on TREC CDs 2 and 4 was partitioned into 98 databases, each about 20 megabytes in size. Testbeds of about 100 databases each w ere also created based on TREC CD's 1 and 2 (Xu and Callan, 1998), TREC CD's 2 and 3 (Lu et al, 1996a Xu and, and TREC CD's 1, 2, and 3 (French et al, 1999Callan, 1999a. A testbed of 921 databases was created by dividing the 20 gigabyte TREC Very Large Corpus (VLC) data into smaller databases (Callan, 1999cFrench et al, 1999.…”

Section: Multi-database Testbedsmentioning

confidence: 99%

“…This task can be di cult because the document rankings and scores produced by e a c h database are based on di erent corpus statistics and possibly di erent representations and/or retrieval algorithms they usually cannot be compared directly. Solutions include computing normalized scores (Kwok et al, 1995Viles and French, 1995Kirsch, 1997Xu and Callan, 1998, estimating normalized scores (Callan et al, 1995bLu et al, 1996a, and merging based on unnormalized scores (Dumais, 1994).…”

Section: Merging Document Rankingsmentioning

confidence: 99%

Distributed Information Retrieval

Callan

The Information Retrieval Series

Self Cite

228

328

View full text Add to dashboard Cite

A m ulti-database model of distributed information retrieval is presented, in which people are assumed to have access to many searchable text databases. In such a n e n vironment, full-text information retrieval consists of discovering database contents, ranking databases by their expected ability to satisfy the query, s e a r c hing a small number of databases, and merging results returned by di erent databases. This paper presents algorithms for each task. It also discusses how to reorganize conventional test collections into multi-database testbeds, and evaluation methodologies for multi-database experiments. A broad and diverse group of experimental results is presented to demonstrate that the algorithms are e ective, e cient, robust, and scalable.

show abstract

Usability, user preferences, effectiveness, and user behaviors when searching individual and integrated full-text databases: implications for digital libraries

Park

2000

J. Am. Soc. Inf. Sci.

View full text Add to dashboard Cite

This article addresses a crucial issue in the digital library environment: how to support effective interaction of users with heterogeneous and distributed information resources. In particular, this study compared usability, user preference, effectiveness, and searching behaviors in systems that implement interaction with multiple databases through a common interface, and with multiple databases as if they were one (integrated interaction) in an experiment in the Text REtrieval Conference (TREC) environment. Twenty‐eight volunteers were recruited from the graduate students of the School of Communication, Information, & Library Studies at Rutgers University. Significantly more subjects preferred the common interface to the integrated interface, mainly because they could have more control over database selection. Subjects were also more satisfied with the results from the common interface, and performed better with the common interface than with the integrated interface. Overall, it appears that for this population, interacting with databases through a common interface, is preferable on all grounds to interacting with databases through an integrated interface. These results suggest that: (1) the general assumption of the information retrieval (IR) literature that an integrated interaction is best needs to be revisited; (2) it is important to allow for more user control in the distributed environment; (3) for digital library purposes, it is important to characterize different databases to support user choice for integration; and, (4) certain users prefer control over database selection while still opting for results to be merged.

show abstract

Effective retrieval with distributed collections

Cited by 125 publications

References 10 publications

The impact of database selection on distributed searching

The impact of database selection on distributed searching

Distributed Information Retrieval

Usability, user preferences, effectiveness, and user behaviors when searching individual and integrated full-text databases: implications for digital libraries

Contact Info

Product

Resources

About