Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval 2013
DOI: 10.1145/2484028.2484142
|View full text |Cite
|
Sign up to set email alerts
|

The bag-of-repeats representation of documents

Abstract: n-gram representations of documents may improve over a simple bag-of-word representation by relaxing the independence assumption of word and introducing context. However, this comes at a cost of adding features which are nondescriptive, and increasing the dimension of the vector space model exponentially.We present new representations that avoid both pitfalls. They are based on sound theoretical notions of stringology, and can be computed in optimal asymptotic time with algorithms using data structures from th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(1 citation statement)
references
References 12 publications
0
1
0
Order By: Relevance
“…As a repercussion, we separate inquiries of this kind that require illness derivation from different sorts. It merits underscoring that large-scale data frequently prompts blast of highlight space in the lights of ngram representation [5] [6] particularly for the group created conflicting data. To keep away from this issue, we use the medicinal phrasings to speak to our information.…”
Section: Introductionmentioning
confidence: 99%
“…As a repercussion, we separate inquiries of this kind that require illness derivation from different sorts. It merits underscoring that large-scale data frequently prompts blast of highlight space in the lights of ngram representation [5] [6] particularly for the group created conflicting data. To keep away from this issue, we use the medicinal phrasings to speak to our information.…”
Section: Introductionmentioning
confidence: 99%