We consider a digital library distributed in a tightly coupled environment. The library is indexed by inverted files and the vector space model is used as ranking strategy. Using a simple analytical model coupled with a small simulator, we study how query performance is affected by the index organization, the network speed, and the disks transfer rate. Our results, which are based on the Tipster/Trec3 collection, indicate that a global index organization might outperform a local index organization.
Search engines represent a key component of Web economy these days. Despite that, there is not much technical literature available on their design, fine tuning, and internal operation. In this work, we make a preliminary attempt to partially fulfill this gap. We distinguish that Web query processing is composed of two phases: (a) retrieving information on documents related to the queries and ranking them, and (b) generating snippets, title, and URL information for the answer page. The second phase has cost that is basically constant on the size of the collection, while the cost of the first phase is affected by the size of the collection. Thus, we concentrate here on studying the behavior of a search engine while executing the first phase of query processing. Using real data and a small cluster of index servers, we study four basic and key issues related to this first phase of query processing: load balance, broker behavior, performance by individual index servers, and overall throughput. Our study, while preliminary, does reveal interesting tradeoffs: (1) that load unbalance at low query arrival rates can be controlled with a simple measure of randomizing the distribution of documents among the index servers, (2) that the broker is not a bottleneck, (3) that disk and CPU utilization at individual servers depends on the relationship between memory size and the distribution of frequencies for the query terms, and (4) that load unbalance at high loads prevents higher throughput. Our results suggest that further studying and evaluating search engines is a promising research avenue.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.