Notwithstanding the popularity of conventional clustering algorithms such as
K-means and probabilistic clustering, their clustering results are sensitive to
the presence of outliers in the data. Even a few outliers can compromise the
ability of these algorithms to identify meaningful hidden structures rendering
their outcome unreliable. This paper develops robust clustering algorithms that
not only aim to cluster the data, but also to identify the outliers. The novel
approaches rely on the infrequent presence of outliers in the data which
translates to sparsity in a judiciously chosen domain. Capitalizing on the
sparsity in the outlier domain, outlier-aware robust K-means and probabilistic
clustering approaches are proposed. Their novelty lies on identifying outliers
while effecting sparsity in the outlier domain through carefully chosen
regularization. A block coordinate descent approach is developed to obtain
iterative algorithms with convergence guarantees and small excess computational
complexity with respect to their non-robust counterparts. Kernelized versions
of the robust clustering algorithms are also developed to efficiently handle
high-dimensional data, identify nonlinearly separable clusters, or even cluster
objects that are not represented by vectors. Numerical tests on both synthetic
and real datasets validate the performance and applicability of the novel
algorithms.Comment: Submitted to IEEE Trans. on PAM
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.