2015
DOI: 10.1145/2699669
|View full text |Cite
|
Sign up to set email alerts
|

Two-Stage Document Length Normalization for Information Retrieval

Abstract: The standard approach for term frequency normalization is based only on the document length. However, it does not distinguish the verbosity from the scope, these being the two main factors determining the document length. Because the verbosity and scope have largely different effects on the increase in term frequency, the standard approach can easily suffer from insufficient or excessive penalization depending on the specific type of long document. To overcome these problems, this paper proposes two-stage norm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 57 publications
0
6
0
Order By: Relevance
“…Interestingly, recent research has developed a two-stage document length normalisation framework [Na 2015] which incorporates both verbosity and scope normalisation into retrieval methods. It is appealing that the SPUD retrieval methods derived from our probabilistic framework contain these aspects of normalisation naturally.…”
Section: Theoretical Discussionmentioning
confidence: 99%
“…Interestingly, recent research has developed a two-stage document length normalisation framework [Na 2015] which incorporates both verbosity and scope normalisation into retrieval methods. It is appealing that the SPUD retrieval methods derived from our probabilistic framework contain these aspects of normalisation naturally.…”
Section: Theoretical Discussionmentioning
confidence: 99%
“…Fang et al [14] also proposed the use of perturbed document collections to gather further insights on retrieval functions fulfilling the same set of axioms. This approach has not been followed-up upon in works other than [31].…”
Section: Related Workmentioning
confidence: 99%
“…where k, k 0 , and b are constants and Δ is the average document length. The TF component of the BM25 ranking function incorporates document length normalization, which ensures long documents are not excessively favored over short documents in retrieval (Na, 2015;Singhal et al, 1996). Instead of a simple normalization by the document length j d j, the normalization in BM25 takes into account that the length of a document may depend on the document's verbosity and scope (Robertson & Walker, 1994).…”
Section: Related Workmentioning
confidence: 99%
“…A document d may be represented as a vector with each dimension and its value corresponding to a term t in d and the TF, respectively. The use of TF normalized by the document length in ranking functions can enhance retrieval effectiveness (Na, 2015;Singhal et al, 1996). We denote the normalized TF by f t, d ð Þ and represent the document with normalized TF, d, by a set of tuples, that is,…”
Section: Theoretical Foundationmentioning
confidence: 99%