Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2002
DOI: 10.1145/775047.775133
|View full text |Cite
|
Sign up to set email alerts
|

A robust and efficient clustering algorithm based on cohesion self-merging

Abstract: Data clustering has attracted a lot of research attention in the field of computational statistics and data mining. In most related studies, the dissimilarity between two clusters is defined as the distance between their centroids, or the distance between two closest (or farthest) data points. However, all of these measurements are vulnerable to outliers, and removing the outliers precisely is yet another difficult task. In view of this, we propose a new similarity measurement, referred to as cohesion, to meas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2003
2003
2018
2018

Publication Types

Select...
3
3
2

Relationship

3
5

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 15 publications
0
11
0
Order By: Relevance
“…If it does not exist (empty network) or if it does not satisfy the condition of equation 1, a new node is added with ) (t p w new . In the first layer, if the network is not empty the threshold is adjusted by the formula (2). Otherwise the threshold is infinite:…”
Section: Proposed Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…If it does not exist (empty network) or if it does not satisfy the condition of equation 1, a new node is added with ) (t p w new . In the first layer, if the network is not empty the threshold is adjusted by the formula (2). Otherwise the threshold is infinite:…”
Section: Proposed Methodsmentioning
confidence: 99%
“…Figure 2. 2D noisy artificial data set used for the experiment As stated in [2] and [19], neither clustering algorithm can correctly partition such a data set nor eliminate noise from clusters.…”
Section: Proposed Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…There are other fast algorithms designed for clustering large numerical data sets, such as CLARANS [24], BIRCH [25], DBSCAN [26], CURE [27], and CSM [28]. In addition, several approaches in [29][30][31] are proposed to solve the high dimensionality and data sparsity problems of numerical data.…”
Section: Related Workmentioning
confidence: 99%
“…Data clustering is a useful technique for many applications, including similarity search, pattern recognition, trend analysis, marketing analysis, grouping, classification of documents, and so forth [3][7] [10]. In data clustering, similar data points are grouped together in a cluster.…”
Section: Introductionmentioning
confidence: 99%