In this work, we study recent advances in context-sensitive language models for the task of query expansion. We study the behavior of existing and new approaches for lexical word-based expansion in both unsupervised and supervised contexts. For unsupervised models, we study the behavior of the Contextualized Embeddings for Query Expansion (CEQE) model. We introduce a new model, Supervised Contextualized Query Expansion with Transformers (SQET) that performs expansion as a supervised classification task and leverages context in pseudo-relevant results. We study the behavior of these expansion approaches for the tasks of ad-hoc document and passage retrieval. We conduct experiments combining expansion with probabilistic retrieval models as well as neural document ranking models. We evaluate expansion effectiveness on three standard TREC collections: Robust, Complex Answer Retrieval, and Deep Learning. We analyze the results of extrinsic retrieval effectiveness, intrinsic ability to rank expansion terms, and perform a qualitative analysis of the differences between the methods. We find out CEQE statically significantly outperforms static embeddings across all three datasets for Recall@1000. Moreover, CEQE outperforms static embedding-based expansion methods on multiple collections (by up to 18% on Robust and 31% on Deep Learning on average precision) and also improves over proven probabilistic pseudo-relevance feedback (PRF) models. SQET outperforms CEQE by 6% in P@20 on the intrinsic term ranking evaluation and is approximately as effective in retrieval performance. Models incorporating neural and CEQE-based expansion score achieves gains of up to 5% in P@20 and 2% in AP on Robust over the state-ofthe-art transformer-based re-ranking model, Birch.
Keywords Query expansion • Contextualized language models • EmbeddingsThis is an extension of CEQE: Contextualized Embeddings for Query Expansion, published in ECIR 2021. This work has several key differences and extensions over previous work. First, it adds additional experimental results of CEQE on a third TREC dataset, Complex Answer Retrieval (CAR). These experiments include unsupervised retrieval and intrinsic evaluation results. Second, it proposes a new and previously unpublished contextual expansion model, SQET that is a discriminatively trained supervised model that classifies expansion terms. The behavior of CEQE and SQET are compared on the Robust test collection for both extrinsic retrieval effectiveness and intrinsic ability to rank expansion terms. Finally, a qualitative comparison of CEQE and SQET terms as well as a discussion of per-layer CEQE behavior is provided. We estimate 30-50% of this work is new or significantly updated over the original paper.