2004
DOI: 10.1007/s10115-003-0115-8
|View full text |Cite
|
Sign up to set email alerts
|

Query-Sensitive Similarity Measures for Information Retrieval

Abstract: Abstract. The application of document clustering to information retrieval has been motivated by the potential effectiveness gains postulated by the cluster hypothesis. The hypothesis states that relevant documents tend to be highly similar to each other, and therefore tend to appear in the same clusters. In this paper we propose an axiomatic view of the hypothesis, by suggesting that documents relevant to the same query (co-relevant documents) display an inherent similarity to each other which is dictated by t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
1
3

Year Published

2005
2005
2017
2017

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 37 publications
(22 citation statements)
references
References 26 publications
(46 reference statements)
0
18
1
3
Order By: Relevance
“…Outlier-based Re-ranking Method According to the clustering hypothesis [Rijsbergen 1979] [Tombros and van 2004], the topically-relevant documents tend to cluster together, while the irrelevant ones would be scattered. By considering the scattered irrelevant documents as outlier documents, we then propose to use outlier detection methods to automatically detect the irrelevant documents.…”
Section: Three Re-ranking Methodsmentioning
confidence: 99%
“…Outlier-based Re-ranking Method According to the clustering hypothesis [Rijsbergen 1979] [Tombros and van 2004], the topically-relevant documents tend to cluster together, while the irrelevant ones would be scattered. By considering the scattered irrelevant documents as outlier documents, we then propose to use outlier detection methods to automatically detect the irrelevant documents.…”
Section: Three Re-ranking Methodsmentioning
confidence: 99%
“…Recent research in inter-document similarity [20,21] has suggested that similarity measures that take the query into account are more effective than conventional measures. This class of similarity measures is called query-sensitive (QSSM).…”
Section: Structurementioning
confidence: 99%
“…For each query, we retrieve the top 100 documents and use them for our study. In [12,20,21] it has been demonstrated that using relationships from among documents ranked high by an IR system in response to a query, is more effective than using relationships from entire document collections.…”
Section: Experimental Environmentmentioning
confidence: 99%
See 1 more Smart Citation
“…For the purpose of query performance prediction, this measure is tailored further using the query-dependant extension of the dotproduct described by Tombros and van Rijsbergen in [TvR04]. Amongst the alternatives suggested, the similarity between two documents is calculated here as the product of their cosine dot product and the query-dependant component.…”
Section: Approximation Of the Cox-lewis Statisticmentioning
confidence: 99%