2006
DOI: 10.1109/icdm.2006.81
|View full text |Cite
|
Sign up to set email alerts
|

High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets

Abstract: Abstract

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
35
0

Year Published

2007
2007
2018
2018

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 26 publications
(36 citation statements)
references
References 12 publications
1
35
0
Order By: Relevance
“…Method build-model Next, in order to determine the pattern significance at class and global levels, we use a common 2 x 2 contingency-table-based interestingness measure. A recent study [19] evaluated most of the interestingness measures found in [12,13], in the context of hierarchical document clustering, and reported that only a small number of interestingness measures generalize well to datasets with varying characteristics. Coincidently, we found that the same measures (in a slightly different order) are useful to determine class and global significance values for pattern-based classification.…”
Section: Building the Classification Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…Method build-model Next, in order to determine the pattern significance at class and global levels, we use a common 2 x 2 contingency-table-based interestingness measure. A recent study [19] evaluated most of the interestingness measures found in [12,13], in the context of hierarchical document clustering, and reported that only a small number of interestingness measures generalize well to datasets with varying characteristics. Coincidently, we found that the same measures (in a slightly different order) are useful to determine class and global significance values for pattern-based classification.…”
Section: Building the Classification Modelmentioning
confidence: 99%
“…We first evaluated the effectiveness of various interestingness measures [12,13], to determine global and class significance values (i.e., Section 2.2) on a number of datasets, and found that the top measures reported in [19], in the context of hierarchical document clustering, also consistently performed well in our context (in a slightly different order). We observe that Added Value generally outperformed other measures, while Mutual Information, ChiSquare, and Yule's Q achieved very close (i.e., within a few-percent range) classification performance.…”
Section: Classification Performancementioning
confidence: 99%
“…Considering these issues, a number of recent approaches [11,12,15,18,19,20] adapted uncompressed bitmap-based representations (i.e., vertical bit vectors). In these approaches, a bitmap is generated for each item in the dataset, where each bit represents presence or absence of the item in a transaction.…”
Section: Bitmap-based Representationsmentioning
confidence: 99%
“…In these approaches, a bitmap is generated for each item in the dataset, where each bit represents presence or absence of the item in a transaction. Some of these approaches [15] also reduce the number of bitmaps by eliminating non-frequent 1-itemsets as a preprocessing step. Support is calculated by ANDing (i.e., intersecting) bitmaps of all items in the itemset, and counting the number of one-bits in the resulting bitmap.…”
Section: Bitmap-based Representationsmentioning
confidence: 99%
See 1 more Smart Citation