BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Nandan, Thakur,; Reimers, Nils; Rücklé, Andreas; Srivastava, Abhishek Kumar; Gurevych, Iryna

doi:10.48550/arxiv.2104.08663

Cited by 53 publications

(115 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In fact, we already know the answer, at least in part: learned representations often perform terribly in out-of-distribution settings when applied in a zero-shot manner. Evidence comes from the BEIR benchmark [Thakur et al, 2021], which aims to evaluate the effectiveness of dense retrieval models across diverse domains. Results show that, in many cases, directly applying a dense retrieval model trained on one dataset to another dataset sometimes yields effectiveness that is worse than BM25.…”

Section: Discussionmentioning

confidence: 99%

“…For example, Li et al [2021] proposed model uncertainty fusion as a solution. The BEIR benchmark [Thakur et al, 2021] provides a resource to evaluate progress, and the latest results show that learned sparse representations are able to outperform BM25 [Formal et al, 2021a]. At a high level, there are at least three intertwined research questions:…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

A Proposed Conceptual Framework for a Representational Approach to Information Retrieval

Lin¹

2021

Preprint

View full text Add to dashboard Cite

This paper outlines a conceptual framework for understanding recent developments in information retrieval and natural language processing that attempts to integrate dense and sparse retrieval methods. I propose a representational approach that breaks the core text retrieval problem into a logical scoring model and a physical retrieval model. The scoring model is defined in terms of encoders, which map queries and documents into a representational space, and a comparison function that computes query-document scores. The physical retrieval model defines how a system produces the top-k scoring documents from an arbitrarily large corpus with respect to a query. The scoring model can be further analyzed along two dimensions: dense vs. sparse representations and supervised (learned) vs. unsupervised approaches. I show that many recently proposed retrieval methods, including multi-stage ranking designs, can be seen as different parameterizations in this framework, and that a unified view suggests a number of open research questions, providing a roadmap for future work. As a bonus, this conceptual framework establishes connections to sentence similarity tasks in natural language processing and information access "technologies" prior to the dawn of computing.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

A Proposed Conceptual Framework for a Representational Approach to Information Retrieval

Lin¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…A potential solution is to train a dense retriever on a large retrieval dataset such as MS-MARCO, and then apply it to new domains, a setting refered to as zero-shot. Unfortunately, in this setting dense retrievers are often outperformed by classical methods based on term-frequency, which do not require supervision (Thakur et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

Unsupervised Dense Information Retrieval with Contrastive Learning

Izacard¹,

Caron²,

Hosseini³

et al. 2021

Preprint

View full text Add to dashboard Cite

Information retrieval is an important component in natural language processing, for knowledge intensive tasks such as question answering and fact checking. Recently, information retrieval has seen the emergence of dense retrievers, based on neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new domains or applications with no training data, and are often outperformed by term-frequency methods such as BM25 which are not supervised. Thus, a natural question is whether it is possible to train dense retrievers without supervision. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers, and show that it leads to strong retrieval performance. More precisely, we show on the BEIR benchmark that our model outperforms BM25 on 11 out of 15 datasets. Furthermore, when a few thousands examples are available, we show that fine-tuning our model on these leads to strong improvements compared to BM25. Finally, when used as pre-training before fine-tuning on the MS-MARCO dataset, our technique obtains state-of-the-art results on the BEIR benchmark.

show abstract

“…Here, neural approaches lead to enormous effectiveness gains over traditional techniques [8,13,20,24]. A valid concern is the generalizability and applicability of the developed techniques to other domains and settings [16,14,31,34].…”

Section: Introductionmentioning

confidence: 99%

Establishing Strong Baselines for TripClick Health Retrieval

Hofstätter¹,

Althammer²,

Sertkan³

et al. 2022

Preprint

View full text Add to dashboard Cite

We present strong Transformer-based re-ranking and dense retrieval baselines for the recently released TripClick health ad-hoc retrieval collection. We improve the -originally too noisy -training data with a simple negative sampling policy. We achieve large gains over BM25 in the re-ranking task of TripClick, which were not achieved with the original baselines. Furthermore, we study the impact of different domainspecific pre-trained models on TripClick. Finally, we show that dense retrieval outperforms BM25 by considerable margins, even with simple training procedures.

show abstract

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Cited by 53 publications

References 32 publications

A Proposed Conceptual Framework for a Representational Approach to Information Retrieval

A Proposed Conceptual Framework for a Representational Approach to Information Retrieval

Unsupervised Dense Information Retrieval with Contrastive Learning

Establishing Strong Baselines for TripClick Health Retrieval

Contact Info

Product

Resources

About