2011 11th International Conference on Intelligent Systems Design and Applications 2011
DOI: 10.1109/isda.2011.6121796
|View full text |Cite
|
Sign up to set email alerts
|

TF-SIDF: Term frequency, sketched inverse document frequency

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2013
2013
2014
2014

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 9 publications
0
6
0
Order By: Relevance
“…This is to obtain a better representation of each user's individual writing habits, as opposed to the writing habits of them and their peers. These emails combine to produce a dictionary of words from which term frequency-inverse document frequency (TF-IDF) scores are calculated [17]. TF-IDF scores are commonly used for identifying important words in text analysis, and are 1 http://www.cs.cmu.edu/ ∼ enron computed using…”
Section: Enron Examplementioning
confidence: 99%
“…This is to obtain a better representation of each user's individual writing habits, as opposed to the writing habits of them and their peers. These emails combine to produce a dictionary of words from which term frequency-inverse document frequency (TF-IDF) scores are calculated [17]. TF-IDF scores are commonly used for identifying important words in text analysis, and are 1 http://www.cs.cmu.edu/ ∼ enron computed using…”
Section: Enron Examplementioning
confidence: 99%
“…In (Shi et al, 2009a), the idea of using a low dimensional sketch (Cormode and Muthukrishnan, 2005) to approximate the TF-IDF representation was applied to large-scale corpora but it was not explored in data stream settings. Recently, Baena- Garcia et al (2011) extended this method to allow efficient representation of massive streams of documents but the effects of this approximation on classification tasks was not analyzed.…”
Section: Additional Related Workmentioning
confidence: 99%
“…Up to our knowledge, the computationally efficient versions of RP described in (Ailon and Chazelle, 2010) still have not been explicitly studied in text domains. Formerly, Baena- Garcia et al (2011) has proposed using the count min sketch to allow the efficient computation of IDF for massive streams of documents, studying the similarity between the ranking of the exact TF-IDF values and that of the approximate values obtained from approximate IDF. However, this algorithm works with exact TF and authors do not assess (theoretically or empirically) the effects of this approximation on document classification tasks.…”
Section: Text Representationmentioning
confidence: 99%
“…In order to index the textual document, we used the Vector Space Model (also known as VSM) [12]. Each document is indexed by its terms in a vector and each term is weighted by means of the TF-IDF function (Term Frequency Inverse Document Frequency) [10]. The representation model generates a very high dimensionality even after pre-treatment and cleaning.…”
Section: The Architecture Of Our Learning Systemmentioning
confidence: 99%