2000
DOI: 10.1145/333135.333136
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

Abstract: The information explosion across the Internet and elsewhere offers access to an increasing number of document collections. In order for users to effectively access these collections, information retrieval (IR) systems must provide coordinated, concurrent, and distributed access. In this article, we explore how to achieve scalable performance in a distributed system for collection sizes ranging from 1GB to 128GB. We implement a fully functional distributed IR system based on a multithreaded version of the Inque… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
35
0

Year Published

2000
2000
2016
2016

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 49 publications
(37 citation statements)
references
References 30 publications
2
35
0
Order By: Relevance
“…We demonstrate that partial replicas can significantly outperform caches using a validated simulator [7,18] which closely matches our working prototype system with replica selection. The prototype uses InQuery for the basic IR functionality [8].…”
Section: Introductionmentioning
confidence: 69%
See 1 more Smart Citation
“…We demonstrate that partial replicas can significantly outperform caches using a validated simulator [7,18] which closely matches our working prototype system with replica selection. The prototype uses InQuery for the basic IR functionality [8].…”
Section: Introductionmentioning
confidence: 69%
“…Most of the previous work experiments with a text database less than 1 GB and focuses on speedup when a text database is distributed over more servers [5,12,17,20]. Only Couvreur et al [9], and Cahoon et al [6,7] use simulation to experiment with more than 100 GB of data. None of these previous studies include partial replication or caching.…”
Section: Scalable Ir Architecturesmentioning
confidence: 99%
“…Parallel generation of a global index has been studied in [18], while a system which crawls the Web and builds a distributed local index was presented in [16]. Cahoon et al [4] evaluated the computational performance of local indices under a variety of workloads, and Hawking [6] examined scalability issues of local index organizations. The prototype of Google was reported as using global index partitioning [3].…”
Section: Index Structure and Query Processing Modelsmentioning
confidence: 99%
“…The prototype of Google was reported as using global index partitioning [3]. Many of the above mentioned works [4,18,6,17,20] describe essentially the same model for processing queries in systems with segmented indices:…”
Section: Index Structure and Query Processing Modelsmentioning
confidence: 99%
“…We downloaded between 1 and 2,303 pages per site by crawling the first two levels of the site. Then, we generated a set of keyword queries from the downloaded pages using term frequencies commonly observed in information retrieval systems (from [4]). We used standard information retrieval techniques to determine which queries matched which documents; namely, TF/IDF weights and the cosine distance [2].…”
Section: Content Sets and Simulation Setupmentioning
confidence: 99%