Sigir ’94 1994
DOI: 10.1007/978-1-4471-2099-5_24
|View full text |Cite
|
Sign up to set email alerts
|

Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
633
1
7

Year Published

2000
2000
2012
2012

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 769 publications
(672 citation statements)
references
References 6 publications
2
633
1
7
Order By: Relevance
“…Keywords are likely to be frequent (at least inside the clusters), and documents containing many of those terms will be promoted in the rank list by the prior. This goes accordingly to the scope hypothesis [9]: documents covering many topics are more likely to be relevant.…”
Section: Probabilistic Priormentioning
confidence: 98%
See 1 more Smart Citation
“…Keywords are likely to be frequent (at least inside the clusters), and documents containing many of those terms will be promoted in the rank list by the prior. This goes accordingly to the scope hypothesis [9]: documents covering many topics are more likely to be relevant.…”
Section: Probabilistic Priormentioning
confidence: 98%
“…Then, our first document length based prior is proportional to document length. The intuition behind this prior is that longer documents span more topics and are more likely to be relevant if no query has been seen (denoted as scope hypothesis in [9]). It has been reported that this prior increases the retrieval performance [6] on the WT10G collection up to 0.03 on an absolute scale.…”
Section: Linear Priormentioning
confidence: 99%
“…Effective information retrieval models generally capture three heuristics, i.e., TF weighting, IDF weighting, and document length normalization [36]. One effective way to assign weights to terms when representing a document as a weighted term vector is the BM25 term weighting method [78], where the normalized TF not only addresses length normalization, but also has an upper bound which improves the robustness as it avoids overly rewarding the matching of any particular term. A document can also be represented with a probability distribution over words (i.e., unigram language models), and the similarity can then be measured based an information theoretic measure such as cross entropy or Kullback-Leibler divergencce [105].…”
Section: Distance-based Clustering Algorithmsmentioning
confidence: 99%
“…The comparisons is between the BM25 weighting scheme (OKAPI) [12][13] and our approach. We can note from figure above (figure 3) that precision points are better for our approach for any points of precision.…”
Section: Comparisonmentioning
confidence: 99%
“…For an efficient Information Retrieval System (IRS) these two sets must be equal as often as possible. The relevance of a document to a query is usually interpreted by most of IR models, vector space [14], probabilistic [12][13] [18], inference and belief networks [20][11] [17], as a score computed by summing the inner products of term weights in the documents and query representations. Whatever the used model, the response to a user need is a list of documents ranked according to a relevance value.…”
Section: Introductionmentioning
confidence: 99%