Distributed Information Retrieval

Callan, Jamie

doi:10.1007/0-306-47019-5_5

Cited by 228 publications

(354 citation statements)

References 17 publications

Supporting

Mentioning

328

Contrasting

Unclassified

Order By: Relevance

“…The top ranked results returned from the selected collections are merged into a single list. Current collection selection methods compare the query with the summary of each collection (term statistics [11] or sample documents [17,16]) and rank collections accordingly.…”

Section: Background and Related Workmentioning

confidence: 99%

A Task-Based Evaluation of an Aggregated Search Interface

Sushmita

Joho

Lalmas

2009

String Processing and Information Retrieval

View full text Add to dashboard Cite

Abstract. This paper presents a user study that evaluated the effectiveness of an aggregated search interface in the context of non-navigational search tasks. An experimental system was developed to present search results aggregated from multiple information sources, and compared to a conventional tabbed interface. Sixteen participants were recruited to evaluate the performance of the two interfaces. Our results suggest that the aggregated search interface is a promising way of supporting nonnavigational search tasks. The quantity and diversity of the retrieved items which participants accessed to complete a task, increased in the aggregated interface. Participants also found the aggregated presentation easier to access to retrieved items and to find relevant information, compared to the conventional interface.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

A Task-Based Evaluation of an Aggregated Search Interface

Sushmita

Joho

Lalmas

2009

String Processing and Information Retrieval

View full text Add to dashboard Cite

show abstract

“…From the database point of view, a distributed information retrieval system could follow a single database model or a multi-database model [4]. In the single database model, the documents are copied to a centralized database, where they are indexed and made searchable.…”

Section: Introductionmentioning

confidence: 99%

Performance Analysis of Distributed Architectures to Index One Terabyte of Text

Cacheda

Plachouras

Ounis

2004

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract.We simulate different architectures of a distributed Information Retrieval system on a very large Web collection, in order to work out the optimal setting for a particular set of resources. We analyse the effectiveness of a distributed, replicated and clustered architecture using a variable number of workstations. A collection of approximately 94 million documents and 1 terabyte of text is used to test the performance of the different architectures. We show that in a purely distributed architecture, the brokers become the bottleneck due to the high number of local answer sets to be sorted. In a replicated system, the network is the bottleneck due to the high number of query servers and the continuous data interchange with the brokers. Finally, we demonstrate that a clustered system will outperform a replicated system if a large number of query servers is used, mainly due to the reduction of the network load.

show abstract

“…We address a particular kind of entity search, namely the search over multiple data sources, called federated search, which entails the three main problems of source representation, source selection, and result merging [8,9]. We focus on the latter for federated entity search in uncooperative settings as illustrated in Figure 1b, where only ranked result lists of entity descriptions are obtained from each source and no further information about the sources is available.…”

Section: Overviewmentioning

confidence: 99%

“…This solution and the comprehensive work of the database community in this realm [5][6][7] assume full access to the entire datasets to compute features such as weights of attributes, co-occurences or to learn parameters, which are then used to resolve all coreferences between two or more datasets in one run. However, access to the entire datasets is either not granted in many application scenarios such as search over multiple Web data sources (where data access is only provided via APIs for single requests), also called federated search over uncooperative sources [8,9], or many data sources are highly dynamic, imposing a high burden on batch processing to keep up with frequent changes and to provide fresh information for time sensitive applications such as search over stock quotes, movies and timetables. Distributed document retrieval for uncooperative environments has been studied in the IR community [8,9].…”

Section: Introductionmentioning

confidence: 99%

“…However, access to the entire datasets is either not granted in many application scenarios such as search over multiple Web data sources (where data access is only provided via APIs for single requests), also called federated search over uncooperative sources [8,9], or many data sources are highly dynamic, imposing a high burden on batch processing to keep up with frequent changes and to provide fresh information for time sensitive applications such as search over stock quotes, movies and timetables. Distributed document retrieval for uncooperative environments has been studied in the IR community [8,9]. We investigate the task of federated entity search with structured entities consisting of a varying number of attributes and corresponding values.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Federated Entity Search Using On-the-Fly Consolidation

Herzig

Mika

Blanco

et al. 2013

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Nowadays, search on the Web goes beyond the retrieval of textual Web sites and increasingly takes advantage of the growing amount of structured data. Of particular interest is entity search, where the units of retrieval are structured entities instead of textual documents. These entities reside in different sources, which may provide only limited information about their content and are therefore called "uncooperative". Further, these sources capture complementary but also redundant information about entities. In this environment of uncooperative data sources, we study the problem of federated entity search, where redundant information about entities is reduced on-the-fly through entity consolidation performed at query time. We propose a novel method for entity consolidation that is based on using language models and completely unsupervised, hence more suitable for this on-the-fly uncooperative setting than state-of-the-art methods that require training data. Further, we apply the same language model technique to deal with the federated search problem of ranking results returned from different sources. Particular novel are the mechanisms we propose to incorporate consolidation results into this ranking. We perform experiments using real Web queries and data sources. Our experiments show that our approach for federated entity search with on-the-fly consolidation improves upon the performance of a state-of-the-art preference aggregation baseline and also benefits from consolidation.

show abstract

Distributed Information Retrieval

Cited by 228 publications

References 17 publications

A Task-Based Evaluation of an Aggregated Search Interface

A Task-Based Evaluation of an Aggregated Search Interface

Performance Analysis of Distributed Architectures to Index One Terabyte of Text

Federated Entity Search Using On-the-Fly Consolidation

Contact Info

Product

Resources

About