SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

Nandan, Thakur,; Wang, Kexin; Gurevych, Iryna; Lin, Jimmy

doi:10.1145/3539618.3591902

Cited by 21 publications

(34 citation statements)

References 46 publications

(102 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Data set. We conduct experiments on 8 chosen data sets (Sun et al, 2023) from BEIR (Thakur et al, 2021): Covid, Touche, DBPedia, SciFact, Signal, News, Robust04, and NFCorpus. Notice that our method is applicable regardless of whether the data set is actually labeled with corresponding graded relevance, since the final output of our method are just real-number ranking scores.…”

Section: Experiments Setupmentioning

confidence: 99%

“…We evaluate our prompts for zero-shot LLM ranking on 8 data sets from BEIR (Thakur et al, 2021). The results show that simply adding the intermediate relevance labels allows LLM rankers to achieve substantially higher ranking performance consistently across different data sets, regardless of whether the actual ground-truth labels of the data set contain multiple graded relevance levels.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The prevalence and characteristics of metabolic syndrome according to different definitions in China: a nationwide cross-sectional study, 2012–2015

et al. 2022

View full text Add to dashboard Cite

Background Metabolic syndrome (MetS) is characterized by a cluster of signs of metabolic disturbance and has caused a huge burden on the health system. The study aims to explore the prevalence and characteristics of MetS defined by different criteria in the Chinese population. Methods Using the data of the China Hypertension Survey (CHS), a nationally representative cross-sectional study from October 2012 to December 2015, a total of 28,717 participants aged 35 years and above were included in the analysis. The MetS definitions of the International Diabetes Federation (IDF), the updated US National Cholesterol Education Program Adult Treatment Panel III (the revised ATP III), and the Joint Committee for Developing Chinese Guidelines (JCDCG) on Prevention and Treatment of Dyslipidemia in Adults were used. Multivariable logistic regression was used to identify factors associated with MetS. Results The prevalence of MetS diagnosed according to the definitions of IDF, the revised ATP III, and JCCDS was 26.4%, 32.3%, and 21.5%, respectively. The MetS prevalence in men was lower than in women by IDF definition (22.2% vs. 30.3%) and by the revised ATP III definition (29.2% vs. 35.4%), but the opposite was true by JCDCG (24.4%vs 18.5%) definition. The consistency between the three definitions for men and the revised ATP III definition and IDF definition for women was relatively good, with kappa values ranging from 0.77 to 0.89, but the consistency between the JCDCG definition and IDF definition (kappa = 0.58) and revised ATP III definition (kappa = 0.58) was poor. Multivariable logistic regression showed that although the impact and correlation intensity varied with gender and definition, area, age, education, smoking, alcohol use, and family history of cardiovascular disease were factors related to MetS. Conclusions The prevalence and characteristics of the MetS vary with the definition used in the Chinese population. The three MetS definitions are more consistent in men but relatively poor in women. On the other hand, even if estimated according to the definition of the lowest prevalence, MetS is common in China.

show abstract

Section: Experiments Setupmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

The prevalence and characteristics of metabolic syndrome according to different definitions in China: a nationwide cross-sectional study, 2012–2015

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Our evaluation uses MS MARCO passages [4] and BEIR datasets [33]. MS MARCO has 8.8M passages while BEIR has 13 different datasets of varying sizes up-to 5.4M.…”

Section: Discussionmentioning

confidence: 99%

“…This paper focuses on the SPLADE family of sparse representations [6][7][8] because it can deliver a high MRR@10 score for MS MARCO passage ranking [4] and a strong zero-shot performance for the BEIR datasets [33], which are well-recognized IR benchmarks. The sparsification optimization in SPLADE has used L1 and FLOPS regularization to minimize non-zero weights during model learning, and our objective is to exploit additional opportunities to further increase the sparsity of inverted indices produced by SPLADE.…”

Section: Introductionmentioning

confidence: 99%

Representation Sparsification with Hybrid Thresholding for Fast SPLADE-based Document Retrieval

Qiao

Yang

et al. 2023

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

Learned sparse document representations using a transformerbased neural model has been found to be attractive in both relevance effectiveness and time efficiency. This paper describes a representation sparsification scheme based on hard and soft thresholding with an inverted index approximation for faster SPLADE-based document retrieval. It provides analytical and experimental results on the impact of this learnable hybrid thresholding scheme. CCS CONCEPTS• Information systems → Retrieval efficiency.

show abstract

“…There are alternative directions one may take to deploy a PLM ranker in a specific task for which no or limited training data is available. These include for example the zero-shot application of PLM rankers trained on another, resource-rich, retrieval task or domain [55,61], the learning with few-shot examples [16], and approaches based on pseudolabelling [59]. However the effectiveness of these approaches depends on the relatedness of the fine-tuning task or the pre-training domain of the language model to the target retrieval task [60]; thus their generalization capabilities remain unclear.…”

Section: Introductionmentioning

confidence: 99%

Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection

Althammer,

Zuccon,

Hofstätter

et al. 2023

Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia P

View full text Add to dashboard Cite

Search methods based on Pretrained Language Models (PLM) have demonstrated great effectiveness gains compared to statistical and early neural ranking models. However, fine-tuning PLM-based rankers requires a great amount of annotated training data. Annotating data involves a large manual effort and thus is expensive, especially in domain specific tasks. In this paper we investigate finetuning PLM-based rankers under limited training data and budget. We investigate two scenarios: fine-tuning a ranker from scratch, and domain adaptation starting with a ranker already fine-tuned on general data, and continuing fine-tuning on a target dataset.We observe a great variability in effectiveness when fine-tuning on different randomly selected subsets of training data. This suggests that it is possible to achieve effectiveness gains by actively selecting a subset of the training data that has the most positive effect on the rankers. This way, it would be possible to fine-tune effective PLM rankers at a reduced annotation budget. To investigate this, we adapt existing Active Learning (AL) strategies to the task of fine-tuning PLM rankers and investigate their effectiveness, also considering annotation and computational costs. Our extensive analysis shows that AL strategies do not significantly outperform random selection of training subsets in terms of effectiveness. We further find that gains provided by AL strategies come at the expense of more assessments (thus higher annotation costs) and AL strategies underperform random selection when comparing effectiveness given a fixed annotation cost. Our results highlight that "optimal" subsets of training data that provide high effectiveness at low annotation cost do exist, but current mainstream AL strategies applied to PLM rankers are not capable of identifying them.

show abstract

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

Cited by 21 publications

References 46 publications

The prevalence and characteristics of metabolic syndrome according to different definitions in China: a nationwide cross-sectional study, 2012–2015

The prevalence and characteristics of metabolic syndrome according to different definitions in China: a nationwide cross-sectional study, 2012–2015

Representation Sparsification with Hybrid Thresholding for Fast SPLADE-based Document Retrieval

Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection

Contact Info

Product

Resources

About