Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023
DOI: 10.1145/3539618.3591902
|View full text |Cite
|
Sign up to set email alerts
|

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

Abstract: Traditionally, sparse retrieval systems relied on lexical representations to retrieve documents, such as BM25, dominated information retrieval tasks. With the onset of pre-trained transformer models such as BERT, neural sparse retrieval has led to a new paradigm within retrieval. Despite the success, there has been limited software supporting different sparse retrievers running in a unified, common environment. This hinders practitioners from fairly comparing different sparse models and obtaining realistic eva… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 21 publications
(34 citation statements)
references
References 46 publications
(102 reference statements)
0
5
0
Order By: Relevance
“…Data set. We conduct experiments on 8 chosen data sets (Sun et al, 2023) from BEIR (Thakur et al, 2021): Covid, Touche, DBPedia, SciFact, Signal, News, Robust04, and NFCorpus. Notice that our method is applicable regardless of whether the data set is actually labeled with corresponding graded relevance, since the final output of our method are just real-number ranking scores.…”
Section: Experiments Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…Data set. We conduct experiments on 8 chosen data sets (Sun et al, 2023) from BEIR (Thakur et al, 2021): Covid, Touche, DBPedia, SciFact, Signal, News, Robust04, and NFCorpus. Notice that our method is applicable regardless of whether the data set is actually labeled with corresponding graded relevance, since the final output of our method are just real-number ranking scores.…”
Section: Experiments Setupmentioning
confidence: 99%
“…We evaluate our prompts for zero-shot LLM ranking on 8 data sets from BEIR (Thakur et al, 2021). The results show that simply adding the intermediate relevance labels allows LLM rankers to achieve substantially higher ranking performance consistently across different data sets, regardless of whether the actual ground-truth labels of the data set contain multiple graded relevance levels.…”
Section: Introductionmentioning
confidence: 99%
“…Our evaluation uses MS MARCO passages [4] and BEIR datasets [33]. MS MARCO has 8.8M passages while BEIR has 13 different datasets of varying sizes up-to 5.4M.…”
Section: Discussionmentioning
confidence: 99%
“…This paper focuses on the SPLADE family of sparse representations [6][7][8] because it can deliver a high MRR@10 score for MS MARCO passage ranking [4] and a strong zero-shot performance for the BEIR datasets [33], which are well-recognized IR benchmarks. The sparsification optimization in SPLADE has used L1 and FLOPS regularization to minimize non-zero weights during model learning, and our objective is to exploit additional opportunities to further increase the sparsity of inverted indices produced by SPLADE.…”
Section: Introductionmentioning
confidence: 99%
“…There are alternative directions one may take to deploy a PLM ranker in a specific task for which no or limited training data is available. These include for example the zero-shot application of PLM rankers trained on another, resource-rich, retrieval task or domain [55,61], the learning with few-shot examples [16], and approaches based on pseudolabelling [59]. However the effectiveness of these approaches depends on the relatedness of the fine-tuning task or the pre-training domain of the language model to the target retrieval task [60]; thus their generalization capabilities remain unclear.…”
Section: Introductionmentioning
confidence: 99%