Local and Global Query Expansion for Hierarchical Complex Topics

Dalton, Jeff; Naseri, Shahrzad; Dietz, Laura; Allan, James

doi:10.1007/978-3-030-15712-8_19

Cited by 7 publications

(7 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Embedding-based Expansion Another approach for query expansion incorporates static embeddings (Pennington et al 2014;Mikolov et al 2013) to find the relevant terms to the query, because embeddings promise to capture the semantic similarity between terms and are used in different ways to expand queries (Diaz et al, 2016;Kuzi et al, 2016;Zamani & Croft, 2016Dalton et al, 2019;Roy et al, 2016;Naseri et al, 2018). These word embeddings, such as Word2Vec, GloVe, and others, learn a static word embedding for each term regardless of the context.…”

Section: Background and Related Workmentioning

confidence: 99%

“…Complex Answer Retrieval (Table 2) We follow previous expansion work on CAR (Dalton et al 2019), and use BenchmarkY1Tree with the root topic titles removed. This is the recommended setup from the CAR organizers, and is an updated version of the widely used hierarchical judgments (and therefore slightly different from reported hierarchical values ).…”

Section: Contextualized Query Expansionmentioning

confidence: 99%

See 1 more Smart Citation

CEQE to SQET: A study of contextualized embeddings for query expansion

et al. 2022

Self Cite

View full text Add to dashboard Cite

In this work, we study recent advances in context-sensitive language models for the task of query expansion. We study the behavior of existing and new approaches for lexical word-based expansion in both unsupervised and supervised contexts. For unsupervised models, we study the behavior of the Contextualized Embeddings for Query Expansion (CEQE) model. We introduce a new model, Supervised Contextualized Query Expansion with Transformers (SQET) that performs expansion as a supervised classification task and leverages context in pseudo-relevant results. We study the behavior of these expansion approaches for the tasks of ad-hoc document and passage retrieval. We conduct experiments combining expansion with probabilistic retrieval models as well as neural document ranking models. We evaluate expansion effectiveness on three standard TREC collections: Robust, Complex Answer Retrieval, and Deep Learning. We analyze the results of extrinsic retrieval effectiveness, intrinsic ability to rank expansion terms, and perform a qualitative analysis of the differences between the methods. We find out CEQE statically significantly outperforms static embeddings across all three datasets for Recall@1000. Moreover, CEQE outperforms static embedding-based expansion methods on multiple collections (by up to 18% on Robust and 31% on Deep Learning on average precision) and also improves over proven probabilistic pseudo-relevance feedback (PRF) models. SQET outperforms CEQE by 6% in P@20 on the intrinsic term ranking evaluation and is approximately as effective in retrieval performance. Models incorporating neural and CEQE-based expansion score achieves gains of up to 5% in P@20 and 2% in AP on Robust over the state-ofthe-art transformer-based re-ranking model, Birch. Keywords Query expansion • Contextualized language models • EmbeddingsThis is an extension of CEQE: Contextualized Embeddings for Query Expansion, published in ECIR 2021. This work has several key differences and extensions over previous work. First, it adds additional experimental results of CEQE on a third TREC dataset, Complex Answer Retrieval (CAR). These experiments include unsupervised retrieval and intrinsic evaluation results. Second, it proposes a new and previously unpublished contextual expansion model, SQET that is a discriminatively trained supervised model that classifies expansion terms. The behavior of CEQE and SQET are compared on the Robust test collection for both extrinsic retrieval effectiveness and intrinsic ability to rank expansion terms. Finally, a qualitative comparison of CEQE and SQET terms as well as a discussion of per-layer CEQE behavior is provided. We estimate 30-50% of this work is new or significantly updated over the original paper.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Section: Contextualized Query Expansionmentioning

confidence: 99%

CEQE to SQET: A study of contextualized embeddings for query expansion

et al. 2022

Self Cite

View full text Add to dashboard Cite

show abstract

“…Another approach for query expansion incorporates static embeddings [26,19] to find the relevant terms to the query, because embeddings promise to capture the semantic similarity between terms and are used in different ways to expand queries [7,12,36,37,5,31,20]. These word embeddings, such as Word2Vec, GloVe, and others, learn a static word embedding for each term regardless of the context.…”

Section: Embedding-based Expansionmentioning

confidence: 99%

CEQE: Contextualized Embeddings for Query Expansion

Naseri

Dalton

Yates

et al. 2021

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

In this work we leverage recent advances in context-sensitive language models to improve the task of query expansion. Contextualized word representation models, such as ELMo and BERT, are rapidly replacing static embedding models. We propose a new model, Contextualized Embeddings for Query Expansion (CEQE), that utilizes queryfocused contextualized embedding vectors. We study the behavior of contextual representations generated for query expansion in ad-hoc document retrieval. We conduct our experiments on probabilistic retrieval models as well as in combination with neural ranking models. We evaluate CEQE on two standard TREC collections: Robust and Deep Learning. We find that CEQE outperforms static embedding-based expansion methods on multiple collections (by up to 18% on Robust and 31% on Deep Learning on average precision) and also improves over proven probabilistic pseudo-relevance feedback (PRF) models. We further find that multiple passes of expansion and reranking result in continued gains in effectiveness with CEQE-based approaches outperforming other approaches. The final model incorporating neural and CEQE-based expansion score achieves gains of up to 5% in P@20 and 2% in AP on Robust over the state-of-the-art transformer-based re-ranking model, Birch.

show abstract

“…It is clear that integrating PRF signals into deep language models implies a trade-off between effectiveness and efficiency. While current approaches ignored efficiency, the majority still achieved marginal improvements in effective- maintaining efficiency: (i) by concatenating the feedback passages with the original query to form the new queries that contain the relevant signals, (ii) by pre-generating passage collection embeddings and performing PRF in the vector space, because embeddings promise to capture the semantic similarity between terms [10,13,22,36,37,45,56,57],…”

Section: Related Workmentioning

confidence: 99%

Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls

Mourad

Zhuang

et al. 2021

Preprint

View full text Add to dashboard Cite

Pseudo Relevance Feedback (PRF) is known to improve the effectiveness of bag-of-words retrievers. At the same time, deep language models have been shown to outperform traditional bag-of-words rerankers. However, it is unclear how to integrate PRF directly with emergent deep language models. In this article, we address this gap by investigating methods for integrating PRF signals into rerankers and dense retrievers based on deep language models. We consider text-based and vector-based PRF approaches, and investigate different ways of combining and scoring relevance signals. An extensive empirical evaluation was conducted across four different datasets and two task settings (retrieval and ranking).Text-based PRF results show that the use of PRF had a mixed effect on deep rerankers across different datasets. We found that the best effectiveness was achieved when (i) directly concatenating each PRF passage with the query, searching with the new set of queries, and then aggregating the scores; (ii) using Borda to aggregate scores from PRF runs.Vector-based PRF results show that the use of PRF enhanced the effectiveness of deep rerankers and dense retrievers over several evaluation metrics. We found that higher effectiveness was achieved when (i) the query retains either the majority or the same weight within the PRF mechanism, and (ii) a shallower PRF signal (i.e., a smaller number of top-ranked passages) was employed, rather than a deeper signal. Our vector-based PRF method is computationally efficient; thus this represents a general PRF method others can use with deep rerankers and dense retrievers.

show abstract

Local and Global Query Expansion for Hierarchical Complex Topics

Cited by 7 publications

References 22 publications

CEQE to SQET: A study of contextualized embeddings for query expansion

CEQE to SQET: A study of contextualized embeddings for query expansion

CEQE: Contextualized Embeddings for Query Expansion

Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls

Contact Info

Product

Resources

About