Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021
DOI: 10.1145/3404835.3462804
|View full text |Cite
|
Sign up to set email alerts
|

MS MARCO: Benchmarking Ranking Models in the Large-Data Regime

Abstract: Evaluation efforts such as TREC, CLEF, NTCIR and FIRE, alongside public leaderboard such as MS MARCO, are intended to encourage research and track our progress, addressing big questions in our field. However, the goal is not simply to identify which run is "best", achieving the top score. The goal is to move the field forward by developing new robust techniques, that work in many different settings, and are adopted in research and practice. This paper uses the MS MARCO and TREC Deep Learning Track as our case … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
29
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 41 publications
(32 citation statements)
references
References 74 publications
(79 reference statements)
3
29
0
Order By: Relevance
“…A recent perspective paper by Craswell et al [11] provides a complete exposition on the background and status of the MS MARCO project. That paper carefully and thoroughly addresses many common concerns regarding the MS MACRO datasets, including questions of internal validity, robust usefulness, and the reliability of statistical tests.…”
Section: Ms Marcomentioning
confidence: 99%
See 2 more Smart Citations
“…A recent perspective paper by Craswell et al [11] provides a complete exposition on the background and status of the MS MARCO project. That paper carefully and thoroughly addresses many common concerns regarding the MS MACRO datasets, including questions of internal validity, robust usefulness, and the reliability of statistical tests.…”
Section: Ms Marcomentioning
confidence: 99%
“…In this section, we provide only the background required to fully understand of the work reported in the current paper. In particular, Craswell et al [11] address concerns raised by Ferrante et al [12] who apply measurement theory to draw attention to important shortcomings of established evaluation measures, such as MRR. Many of these measures are not interval scaled, and therefore many common statistical tests are not permissible, and properly these measures should not even be averaged.…”
Section: Ms Marcomentioning
confidence: 99%
See 1 more Smart Citation
“…In more detail, there can be a stack of complex re-rankers after the efficient first-stage retriever. The multi-stage cascaded architecture is very common and practical both in the industry (Yin et al, 2016;Liu et al, 2021d;Li and Xu, 2014) and the ranking leaderboard in the academia (Craswell et al, 2021). Considering the large computational cost of Transformer-based pre-trained models, they are often employed to model the last stage re-ranker whose goal is to re-rank a small set of documents provided by previous stage.…”
Section: Pre-training Methods Applied In Re-ranking Componentmentioning
confidence: 99%
“…-CWP200T, SogouT : CWP200T and SogouT (Luo et al, 2017) -MS MARCO: MS MARCO (Craswell et al, 2021) is a popular large-scale document collection which contains about 3.2 million available documents, which are from the Bing search engine. Besides, 1 million non-question queries are also included in this dataset for different retrieval tasks.…”
Section: Datasets For Pre-trainingmentioning
confidence: 99%