2006 5th International Conference on Machine Learning and Applications (ICMLA'06) 2006
DOI: 10.1109/icmla.2006.50
|View full text |Cite
|
Sign up to set email alerts
|

TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams

Abstract: In this paper, we propose a new term weighting scheme called Term Frequency -Inverse Corpus Frequency (TF-ICF). It does not require term frequency information from other documents within the document collection and thus, it enables us to generate the document vectors of N streaming documents in linear time. In the context of a machine learning application, unsupervised document clustering, we evaluated the effectiveness of the proposed approach in comparison to five widely used term weighting schemes through e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
50
0
8

Year Published

2006
2006
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 106 publications
(61 citation statements)
references
References 14 publications
0
50
0
8
Order By: Relevance
“…We then stemmed the document content using a Porter Stemming algorithm [16]. Finally, we generated a term frequency list using TF-ICF [7] and normalized these frequencies for direct document comparison.…”
Section: Documentsmentioning
confidence: 99%
“…We then stemmed the document content using a Porter Stemming algorithm [16]. Finally, we generated a term frequency list using TF-ICF [7] and normalized these frequencies for direct document comparison.…”
Section: Documentsmentioning
confidence: 99%
“…TFIDF ise en çok kullanılan geleneksel yöntemdir. Ayrıca Okapi BM25 [7] ile birlikte TFIDF varyasyonu olan çeşitli yöntemler de bulunmaktadır [8], [9]. Geleneksel yöntemleri yerel ve global kapsamlı olarak kategorize etmek de mümkündür.…”
Section: İlgili çAlışmalarunclassified
“…RelDF (Relative Document Frequency) doküman filtreleme için önerilen ve belirli bir konudaki terimlerin ilgili dokümanlarda daha fazla gözlenmesi gerektiğini varsayan bir yöntemdir [39]. TFICF (Term Frequency-Inverse Corpus Frequency) yöntemi terimin global frekansından bağımsızdır ve dinamik doküman kümeleme için önerilmiştir [9]. (1 + )…”
Section: Yeni Yaklaşımlarunclassified
“…From the literature on non-learning statistical TWS, we found that most of the TWS proposed by researchers are a variation of the TF-IDF weighting scheme (Reed et al, 2006;Salton and Buckley, 1988;Sparck Jones, 1988).…”
Section: Traditional Twsmentioning
confidence: 99%
“…The above and other TWS in the literature (Reed et al, 2006;Greengrass, 2000;McGill, 1979) use some of the document collection characteristics, such as the total numbers of documents in the collection and the document term frequency (number of documents in the document collection that contain this term). In real-world IR systems, these characteristics should be considered as changing over time because nowadays document collections are mostly dynamic instead of static.…”
Section: ) Pivoted Document Length Normalization-idf (Ltu)mentioning
confidence: 99%