2007
DOI: 10.1016/j.ipm.2006.09.016
|View full text |Cite
|
Sign up to set email alerts
|

s-grams: Defining generalized n-grams for information retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
26
0

Year Published

2008
2008
2017
2017

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 36 publications
(26 citation statements)
references
References 14 publications
0
26
0
Order By: Relevance
“…Sometimes, the terms may be very similar, but not identical, due to misspelling or different prefixes/suffixes. To capture content similarity even in those cases, we adopt a Jaccard index on character tri-grams [Järvelin et al 2007]. Let T (q) be the character tri-grams from the terms of query q, we define the similarity σ jaccard as follows.…”
Section: Unsupervised Learning Approachmentioning
confidence: 99%
“…Sometimes, the terms may be very similar, but not identical, due to misspelling or different prefixes/suffixes. To capture content similarity even in those cases, we adopt a Jaccard index on character tri-grams [Järvelin et al 2007]. Let T (q) be the character tri-grams from the terms of query q, we define the similarity σ jaccard as follows.…”
Section: Unsupervised Learning Approachmentioning
confidence: 99%
“…Sometime, such terms may be very similar, but not identical, due to mispelling, or di↵er-ent prefixes/su xes. To capture content distance between queries, we adopt a Jaccard index on tri-grams [10]. Let T (q) be the tri-grams resulting from the terms of query q, we define the distance µ jaccard as follows:…”
Section: Feature Selectionmentioning
confidence: 99%
“…They can also be traced down to their word-stem. This stemming makes it possible to combine sentences in different tenses without having them in different grammatical forms [5]. Another step that usually needs to be taken is that stop and small words need to be deleted since they do not contain relevant information of the text and do not need to be analyzed by the machine.…”
Section: Discussion Of Methodsmentioning
confidence: 99%