2021
DOI: 10.48550/arxiv.2102.10073
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations

Abstract: Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing effective first-stage retrieval in a multistage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections. We aim to support, out of the box, the entire research lifecycle of efforts aimed at improving ranking with modern neural approaches. In particular, Pyserini suppo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 17 publications
(11 citation statements)
references
References 28 publications
0
9
0
Order By: Relevance
“…This approach is used as the initial baseline method for text retriever to compare with other techniques. Pyserini is a simple Python package that aids researchers in reproducing their findings by offering excellent first-component document retrieval for multi-component rating systems [13]. Because Pyserini is simple yet effective, it was chosen to be implemented for the text retriever as a component of the QA system.…”
Section: Related Workmentioning
confidence: 99%
“…This approach is used as the initial baseline method for text retriever to compare with other techniques. Pyserini is a simple Python package that aids researchers in reproducing their findings by offering excellent first-component document retrieval for multi-component rating systems [13]. Because Pyserini is simple yet effective, it was chosen to be implemented for the text retriever as a component of the QA system.…”
Section: Related Workmentioning
confidence: 99%
“…For sparse retrieval methods, we adpot the pyseirni [32] tool for experiments. For dense retrieval methods, we mainly focus on the DPR architecture.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…Inferencing To reproduce the ANCE-PRF results, the authors have provided us with a model checkpoint of PRF depth 3. Since there is no inference code available from the original authors, we utilise the open source IR toolkit Pyserini 4 [16], which has already implemented the ANCE dense retriever, by introducing a second round of ANCE retrieval with the ANCE-PRF model checkpoint. During the inference time, the document index is the same for both the first round ANCE retrieval and the second round ANCE-PRF retrieval.…”
Section: Inferencing and Trainingmentioning
confidence: 99%