2009
DOI: 10.1214/09-aos700
|View full text |Cite
|
Sign up to set email alerts
|

Data spectroscopy: Eigenspaces of convolution operators and clustering

Abstract: This paper focuses on obtaining clustering information about a distribution from its i.i.d. samples. We develop theoretical results to understand and use clustering information contained in the eigenvectors of data adjacency matrices based on a radial kernel function with a sufficiently fast tail decay. In particular, we provide population analyses to gain insights into which eigenvectors should be used and when the clustering information for the distribution can be recovered from the sample. We learn that a f… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
99
0

Year Published

2011
2011
2019
2019

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 75 publications
(100 citation statements)
references
References 12 publications
(17 reference statements)
0
99
0
Order By: Relevance
“…[ 19] pointed out that when the data points have formed clusters, each high density region implicitly corresponds to some low-frequency (smooth) eigenvector which takes rela-tively large absolute values for points in the region (cluster) and whose values are close to zero elsewhere. Note that we exclude the first eigenvector u 1 because it is nearly constant and does not form clusters.…”
Section: Spectral Filtermentioning
confidence: 99%
“…[ 19] pointed out that when the data points have formed clusters, each high density region implicitly corresponds to some low-frequency (smooth) eigenvector which takes rela-tively large absolute values for points in the region (cluster) and whose values are close to zero elsewhere. Note that we exclude the first eigenvector u 1 because it is nearly constant and does not form clusters.…”
Section: Spectral Filtermentioning
confidence: 99%
“…This procedure gives the opportunity to pre-design (and hence know beforehand) the structure of the data that the clustering procedure aims to recover. At first level, we compared the ability of OLYMPUS in determining the correct cluster number against other known unsupervised clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN) [32], Data Spectroscopic (DaSpec) [33] and Gaussian Mixture Model-BIC, as proposed in [34], as well as with Evolutionary kprototype (EKP) [35], an unsupervised hybrid method that integrates evolutionary optimization with clustering categories. At second level, we tested the accuracy of OLYMPUS against the original FSTS and other hybrid approaches such as the evolutionary fuzzy algorithm of Anand et al [18] -called hereafter Anand -and the Improved Differential Evolutionary Fuzzy Clustering (IDEFC) [19].…”
Section: Synthetic Analysismentioning
confidence: 99%
“…Some authors propose a global value of σ for the whole data set e.g. [12] and [14] while the others suggest using a local parameter e.g. [18].…”
Section: Introductionmentioning
confidence: 99%
“…Another open issue of key importance in spectral clustering is that of choosing a proper number of groups. Usually this number is a user defined parameter [12], but sometimes it is estimated -with varying success rate [14] -in a heuristically motivated way. In this paper we present a spectral clustering algorithm Speclus that can simultaneously address both of the above mentioned challenges for a variety of data sets.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation