2020
DOI: 10.1007/978-3-030-45442-5_4
|View full text |Cite
|
Sign up to set email alerts
|

Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants

Abstract: When researchers speak of BM25, it is not entirely clear which variant they mean, since many tweaks to Robertson et al.'s original formulation have been proposed. When practitioners speak of BM25, they most likely refer to the implementation in the Lucene open-source search library. Does this ambiguity "matter"? We attempt to answer this question with a large-scale reproducibility study of BM25, considering eight variants. Experiments on three newswire collections show that there are no significant effectivene… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
21
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 40 publications
(29 citation statements)
references
References 10 publications
1
21
0
Order By: Relevance
“…Thus, this experiment allows us to isolate the effects of BM25 variants, although we must still manually ensure that every system uses the same parameter settings. In short, we have replicated previous replicability studies [6,14], but in a manner that supports cross-system comparisons. From Table 2, we see that effectiveness differences between the various systems with native document processing are larger than with Ciff.…”
Section: Case Study: Bm25 Variantssupporting
confidence: 84%
“…Thus, this experiment allows us to isolate the effects of BM25 variants, although we must still manually ensure that every system uses the same parameter settings. In short, we have replicated previous replicability studies [6,14], but in a manner that supports cross-system comparisons. From Table 2, we see that effectiveness differences between the various systems with native document processing are larger than with Ciff.…”
Section: Case Study: Bm25 Variantssupporting
confidence: 84%
“…While these tracks are relatively new, similar initiatives have been surfacing during the last decade, e.g., as a workshop at ACM RecSys in 2013 . These venues allow researchers to analyze the effect of different implementations of an approach, or explore the extent to which the results may change when a different dataset than the one reported in a paper is used (Kowald et al 2020;Kamphuis et al 2020). Often, the works published in these tracks contain publicly available code and/or datasets.…”
Section: Itemsmentioning
confidence: 99%
“…Often, the works published in these tracks contain publicly available code and/or datasets. However, a common complaint on these papers is the difficulty of perfectly reproducing the experiments from the papers, even when the code is available (which is not very common Collberg and Proebsting 2016;Ferrari Dacrema et al 2019), and in some cases the attempts have to be discarded because inquiries sent to the original authors related to code or data remain unanswered (Ferrari Dacrema et al 2019; Lin and Zhang 2020; Kamphuis et al 2020).…”
Section: Itemsmentioning
confidence: 99%
“…We can also ask their preference on the subject headings recommendation generated using LM and VSM methods. Different retrieval methods, such as: probabilistic BM25 [30], [31] and sequential dependence model (SDM) [32], [33] can be considered for further study. The possibility of combining semantic information [25], [34], [35] or social media [36], [37] into the retrieval model to address lexical mismatch problem is a great challenge to be explored.…”
Section: Analysis Of Subject Headings Overlaps Of Documents From Different Facultiesmentioning
confidence: 99%