Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2002
DOI: 10.1145/775047.775076
|View full text |Cite
|
Sign up to set email alerts
|

Enhanced word clustering for hierarchical text classification

Abstract: In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering" of features has been found to achieve improvements over feature selection in terms of classification accuracy, especially at lower number of features [2,28]. However the existing clustering, techniques are agglomerative in nature and result in (i) sub-optimal word clusters and (ii) high computational cost. In order to explicitly capture t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
52
0
2

Year Published

2004
2004
2009
2009

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 95 publications
(54 citation statements)
references
References 20 publications
0
52
0
2
Order By: Relevance
“…The works of (Slonim & Tishby, 2000), , (Yaniv & Souroujon, 2001) use heuristic procedures to cluster documents and features independently using an agglomerative algorithm. (Dhillon et al, 2002(Dhillon et al, , 2003b on the other hand, propose an information-theoretic coclustering algorithm that intertwines both row (feature) and column (document) clustering. The algorithm starts with a random partition of rows, X, and columns, Y, and computes an approximation q(X,Y) to the original distribution P(X,Y) and a corresponding compressed distribution by co-clustering rows and columns intertwined, i.e.…”
Section: Co-clustering (Clustering Features and Documents)mentioning
confidence: 99%
See 1 more Smart Citation
“…The works of (Slonim & Tishby, 2000), , (Yaniv & Souroujon, 2001) use heuristic procedures to cluster documents and features independently using an agglomerative algorithm. (Dhillon et al, 2002(Dhillon et al, , 2003b on the other hand, propose an information-theoretic coclustering algorithm that intertwines both row (feature) and column (document) clustering. The algorithm starts with a random partition of rows, X, and columns, Y, and computes an approximation q(X,Y) to the original distribution P(X,Y) and a corresponding compressed distribution by co-clustering rows and columns intertwined, i.e.…”
Section: Co-clustering (Clustering Features and Documents)mentioning
confidence: 99%
“…Experiments conducted demonstrate the efficiency of the algorithm especially in the presence of sparsity. (Dai et al, 2007) extend the co-clustering algorithm of (Dhillon et al, 2002(Dhillon et al, , 2003b and present a co-clustering classification algorithm (CoCC) that focuses on classifying documents across different text domains. There is a labelled data set D i from one domain, called in-domain, and an unlabelled data set D o from a related but different domain, called out-of-domain, that is to be classified.…”
Section: Co-clustering (Clustering Features and Documents)mentioning
confidence: 99%
“…A connection between multinomial model-based clustering and the divisive KullbackLeibler clustering (Dhillon et al, 2002;Dhillon & Guan, 2003) is worth mentioning here. It is briefly mentioned in Dhillon and Guan (2003) but they did not explicitly stress that the divisive KL clustering is equivalent to multinomial model-based k-means, which maximizes the following objective function:…”
Section: Multinomial Modelsmentioning
confidence: 99%
“…24;Rasmussen 1992;Silverstein and Pedersen 1997;Jain et al 1999;Manning and Schutze 2001: ch. 14;Everitt et al 2001;Duda et al 2001;Dillon et al 2002;Berkhin 2000;Yao and Choi 2003).…”
Section: Word Clusteringunclassified