Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management 2020
DOI: 10.1145/3340531.3412780
|View full text |Cite
|
Sign up to set email alerts
|

Flexible IR Pipelines with Capreolus

Abstract: While a number of recent open-source toolkits for training and using neural information retrieval models have greatly simplified experiments with neural reranking methods, they essentially hard code a "search-then-rerank" experimental pipeline. These pipelines consist of an efficient first-stage ranking method, like BM25, followed by a neural reranking method. Deviations from this setup often require hacks; some improvements, like adding a second reranking step that uses a more expensive neural method, are inf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
2

Relationship

4
2

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 19 publications
0
9
0
Order By: Relevance
“…PyGaggle is designed specifically to work with Pyserini, but the latter was meant to be used independently, and we explicitly did not wish to "hard code" our own research agenda. This separation has made it easier for other neural IR toolkits to build on Pyserini, for example, the Caprelous toolkit [29,30].…”
Section: Future Developmentsmentioning
confidence: 99%
“…PyGaggle is designed specifically to work with Pyserini, but the latter was meant to be used independently, and we explicitly did not wish to "hard code" our own research agenda. This separation has made it easier for other neural IR toolkits to build on Pyserini, for example, the Caprelous toolkit [29,30].…”
Section: Future Developmentsmentioning
confidence: 99%
“…Since term importance varies from model to model, DiffIR allows the user to provide weight files that indicate the weight given to specific segments of a document. Capreolus [45] can generate these weight files for several neural reranking models by following one of several strategies for producing term importance weights. OpenNIR [28] can currently generate weight files for the EPIC model [30].…”
Section: Term Importance Weightsmentioning
confidence: 99%
“…Recently, several tools have begun to incorporate automatic dataset acquisition. These include Capreolus [93], PyTerrier [58] and OpenNIR [55]. These reduce the user burden of finding the dataset source files and figuring out how to parse them correctly.…”
Section: Introductionmentioning
confidence: 99%
“…A document lookup API provides fast access to source documents, which is useful for recent text-based ranking models, such as those that use BERT [27]. PyTerrier [58], Capreolus [93], and OpenNIR [55] recently added support for ir_datasets, greatly expanding the number of datasets they support, and other tools like Anserini [91] can utilize our tool using the command line interface. Finally, the ir_datasets catalog 2 acts as a documentation hub, making it easy to find datasets and learn about their characteristics.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation