Adamu Garba scite author profile

Ullah

et al. 2020

DTA

PurposeThere have been many challenges in crawling deep web by search engines due to their proprietary nature or dynamic content. Distributed Information Retrieval (DIR) tries to solve these problems by providing a unified searchable interface to these databases. Since a DIR must search across many databases, selecting a specific database to search against the user query is challenging. The challenge can be solved if the past queries of the users are considered in selecting collections to search in combination with word embedding techniques. Combining these would aid the best performing collection selection method to speed up retrieval performance of DIR solutions.Design/methodology/approachThe authors propose a collection selection model based on word embedding using Word2Vec approach that learns the similarity between the current and past queries. They used the cosine and transformed cosine similarity models in computing the similarities among queries. The experiment is conducted using three standard TREC testbeds created for federated search.FindingsThe results show significant improvements over the baseline models.Originality/valueAlthough the lexical matching models for collection selection using similarity based on past queries exist, to the best our knowledge, the proposed work is the first of its kind that uses word embedding for collection selection by learning from past queries.

show abstract

Understanding the impact of query expansion on federated search

Ullah

2023

Multimed Tools Appl

Federated search techniques: an overview of the trends and state of the art

2023

Knowl Inf Syst

Using Ant Colony Optimization for Results Merging in Federated Search

Shah

et al. 2023

Preprint

Federated search routes the user's search query to multiple component collections and presents a merged results list in ranked order by comparing the relevance score of each returned result. However, the heterogeneity of the component collections makes it challenging for the central broker to compare these relevance scores while fusing the results in ranked order. To address this issue, most existing approaches merged the returned results by changing the document ranks to their ranking scores or downloading the documents and computing their relevance score at query time. However, these approaches are less efficient as the former suffer from limited efficiency of results merging due to negligible overlapping documents among the component collections, and the latter is resource intensive. This research addresses this problem by proposing a new method that extracts features of both the documents and component collections from the available information provided by the collections at querying time. Next, each document and its collection features are exploited together to establish the document relevance score. The ant colony optimization is then used for information foraging to create a merged results list. The empirical results on a real-world dataset demonstrate significant improvements by the proposed approach over baseline approaches.

show abstract

Snippet-based result merging in federated search

Journal of Information Science

2023