Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations

Lin, Jimmy; Ma, Xueguang; Lin, Sheng-Chieh; Yang, Jheng-Hong; Pradeep, Ronak; Nogueira, Rodrigo

doi:10.1145/3404835.3463238

Cited by 180 publications

(93 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Accordingly, we evaluate the system runs with The New York Times Annotated Corpus and the topics of TREC Common Core 2017 [1]. As part of our experiments, we exploit the interactive search possibilities of the Pyserini toolkit [29]. We index the Core17 test collection with the help of Anserini [44] and the default indexing options as provided in the regression guide 4 .…”

Section: Datasets and Implementation Detailsmentioning

confidence: 99%

Validating Simulations of User Query Variants

Breuer¹,

Fuhr²,

Schaer³

2022

Preprint

View full text Add to dashboard Cite

System-oriented IR evaluations are limited to rather abstract understandings of real user behavior. As a solution, simulating user interactions provides a cost-efficient way to support system-oriented experiments with more realistic directives when no interaction logs are available. While there are several user models for simulated clicks or result list interactions, very few attempts have been made towards query simulations, and it has not been investigated if these can reproduce properties of real queries. In this work, we validate simulated user query variants with the help of TREC test collections in reference to real user queries that were made for the corresponding topics. Besides, we introduce a simple yet effective method that gives better reproductions of real queries than the established methods. Our evaluation framework validates the simulations regarding the retrieval performance, reproducibility of topic score distributions, shared task utility, effort and effect, and query term similarity when compared with real user query variants. While the retrieval effectiveness and statistical properties of the topic score distributions as well as economic aspects are close to that of real queries, it is still challenging to simulate exact term matches and later query reformulations.

show abstract

Section: Datasets and Implementation Detailsmentioning

confidence: 99%

Validating Simulations of User Query Variants

Breuer¹,

Fuhr²,

Schaer³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…We calculated the TF-IDF index using DrQA implementation for all unigrams and bigrams with 2 24 buckets. Inspired by the criticism of choosing weak baselines presented in [38], we decided to validate our TF-IDF baseline against the proposed Anserini toolkit implemented by Pyserini [39].…”

Section: Document Retrievalmentioning

confidence: 99%

CsFEVER and CTKFacts: Acquiring Czech data for fact verification

Drchal¹,

Ullrich²,

Rýpar³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper, we present two Czech datasets for automated factchecking, which is a task commonly modeled as a classification of textual claim veracity w.r.t. a corpus of trusted ground truths. We consider 3 classes: SUPPORTS, REFUTES complemented with evidence documents or NEI (Not Enough Info) alone. Our first dataset, CsFEVER, has 127,328 claims. It is an automatically generated Czech version of the large-scale FEVER dataset built on top of Wikipedia corpus. We take a hybrid approach of machine translation and document alignment; the approach, and the tools we provide, can be easily applied to other languages. The second dataset, CTKFacts of 3,097 claims, is annotated using the corpus of 2.2M articles of Czech News Agency. We present its extended annotation methodology based on the FEVER approach. We analyze both datasets for spurious cues -annotation patterns leading to model overfitting. CTKFacts is further examined for inter-annotator agreement, thoroughly cleaned,

show abstract

“…For the dense retrievers used in SPAR, we directly take the publicly released checkpoints without retraining to combine with Λ. We use Pyserini (Lin et al, 2021a) for all sparse models used in this work including BM25 and UniCOIL.…”

Section: Implementation Detailsmentioning

confidence: 99%

Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?

Chen

Lakhotia

Oğuz

et al. 2021

Preprint

View full text Add to dashboard Cite

Despite their recent popularity and well known advantages, dense retrievers still lag behind sparse methods such as BM25 in their ability to reliably match salient phrases and rare entities in the query. It has been argued that this is an inherent limitation of dense models. We disprove this claim by introducing the Salient Phrase Aware Retriever (SPAR), a dense retriever with the lexical matching capacity of a sparse model. In particular, we show that a dense retriever Λ can be trained to imitate a sparse one, and SPAR is built by augmenting a standard dense retriever with Λ. When evaluated on five open-domain question answering datasets and the MS MARCO passage retrieval task, SPAR sets a new state of the art for dense and sparse retrievers and can match or exceed the performance of more complicated densesparse hybrid systems.

show abstract

Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations

Cited by 180 publications

References 21 publications

Validating Simulations of User Query Variants

Validating Simulations of User Query Variants

CsFEVER and CTKFacts: Acquiring Czech data for fact verification

Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?

Contact Info

Product

Resources

About