2010
DOI: 10.1007/s10618-010-0172-z
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical document clustering using local patterns

Abstract: The global pattern mining step in existing pattern-based hierarchical clustering algorithms may result in an unpredictable number of patterns. In this paper, we propose IDHC, a pattern-based hierarchical clustering algorithm that builds a cluster hierarchy without mining for globally significant patterns. IDHC first discovers locally promising patterns by allowing each instance to "vote" for its representative size-2 patterns in a way that ensures an effective balance between local pattern frequency and patter… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 24 publications
(13 citation statements)
references
References 21 publications
0
9
0
Order By: Relevance
“…Malik et al proposed a pattern-based instance-driven hierarchical clustering algorithm (IDHC) that builds a cluster hierarchy without mining for globally signi¯cant patterns. 21 While Antonio et al presented a divisive hierarchical method which is based on the use of the k-means method embedded in a recursive algorithm to obtain a clustering at each node of the hierarchy. 4 Liao et al introduces a sample-based hierarchical adaptive K-means (SHAKM) clustering algorithm, which employs multilevel random sampling to handle large databases and utilizes the adaptive K-means clustering algorithm to determine the correct number of clusters.…”
Section: Reviews Of Clustering Methodsmentioning
confidence: 99%
“…Malik et al proposed a pattern-based instance-driven hierarchical clustering algorithm (IDHC) that builds a cluster hierarchy without mining for globally signi¯cant patterns. 21 While Antonio et al presented a divisive hierarchical method which is based on the use of the k-means method embedded in a recursive algorithm to obtain a clustering at each node of the hierarchy. 4 Liao et al introduces a sample-based hierarchical adaptive K-means (SHAKM) clustering algorithm, which employs multilevel random sampling to handle large databases and utilizes the adaptive K-means clustering algorithm to determine the correct number of clusters.…”
Section: Reviews Of Clustering Methodsmentioning
confidence: 99%
“…More recently Fukumoto and Suzuki performed cluster labeling by relying on concepts in a machine readable dictionary (Fukumoto and Suzuki, 2011) with positive results. In another distinct recent work, Malik, et al, focused on finding patterns (i.e., labels) and clusters simultaneously as an alternative to explicitly identifying labels for existing clusters (Malik et al, 2010).…”
Section: Related Workmentioning
confidence: 99%
“…The number of clusters k was set to obtain clusters that averaged 100 documents (i.e., if clustering 1000 documents, k = 10). We additionally used a more recent clustering algorithm, IDHC [9], which is different than k-means in that it does not take a parameter k and produces a variable number of clusters.…”
Section: A Experimental Setupmentioning
confidence: 99%