Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management 2004
DOI: 10.1145/1031171.1031181
|View full text |Cite
|
Sign up to set email alerts
|

Simple BM25 extension to multiple weighted fields

Abstract: This paper describes a simple way of adapting the BM25 ranking formula to deal with structured documents. In the past it has been common to compute scores for the individual fields (e.g. title and body) independently and then combine these scores (typically linearly) to arrive at a final score for the document. We highlight how this approach can lead to poor performance by breaking the carefully constructed non-linear saturation of term frequency in the BM25 function. We propose a much more intuitive alternati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

8
357
1
14

Year Published

2009
2009
2012
2012

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 522 publications
(381 citation statements)
references
References 10 publications
8
357
1
14
Order By: Relevance
“…Early work treated each field as a smaller document and simply combined field-level scores using linear combination or a mixture of probability models [16]. This straightforward combination of field-level scores was found to have limitations, resulting in efforts such as BM25F [17]. Recently, an adaptation of score combination and smoothing method was suggested [23] for the language modeling approach to IR, based on the search engine Indri [15] which supports combining evidence from multiple fields.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Early work treated each field as a smaller document and simply combined field-level scores using linear combination or a mixture of probability models [16]. This straightforward combination of field-level scores was found to have limitations, resulting in efforts such as BM25F [17]. Recently, an adaptation of score combination and smoothing method was suggested [23] for the language modeling approach to IR, based on the search engine Indri [15] which supports combining evidence from multiple fields.…”
Section: Related Workmentioning
confidence: 99%
“…BM25F [17] is the modification of the BM25 model where field-level evidence is combined at the raw frequency level rather than score level. This maintains non-linear saturation of term frequencies.…”
Section: Bm25fmentioning
confidence: 99%
“…using the term frequency in specific fields of structured documents (e.g. title, abstract) [11], or integrating query-independent evidence in the retrieval model in the form of prior probabilities for a document [3,6] ('prior' because they are known before the query). In short, when determining the relevance between a query and a document, most IR models use primarily query-dependent term statistics, and sometimes also add query-independent evidence to further enhance retrieval performance.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, Robertson et al and Zaragoza et al proposed the per-field normalisation technique, which normalises term frequency on a per-field basis [14,18], by extending BM25's normalisation method [13]. The resulting field-based weighting model is called BM25F.…”
Section: Introductionmentioning
confidence: 99%
“…Using BM25F, the retrieval process is performed on indices of different document fields, such as body, title, and anchor text of incoming links. Following [14,18], Macdonald et al extended the PL2 DFR weighting model, by employing the per-field normalisation 2F [10]. Compared with tf normalisation on a single field, on one hand, per-field normalisation can significantly boost the retrieval performance, particularly for Web search [12,18].…”
Section: Introductionmentioning
confidence: 99%