Clustering Through Probability Distribution Analysis Along Eigenpaths

Yang, Wenming; Hui, Changqing; Sun, Daren; Sun, Xiang; Liao, Qingmin

doi:10.1109/tsmc.2018.2884839

Cited by 7 publications

(4 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The tensor spectral clustering methods have achieved superior performance when dealing with high-dimension low-sample-size (HDLSS) data [28], [29], [30]. However, the tensor spectral clustering methods share a common limitation, i.e., they need to construct and store the whole affinity tensor before deriving the tensor spectral embedding.…”

Section: Tensor Spectral Clusteringmentioning

confidence: 99%

Deep Tensor Spectral Clustering Network via Ensemble of Multiple Affinity Tensors

Cai,

Hu,

et al. 2024

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Tensor spectral clustering (TSC) is an emerging approach that explores multi-wise similarities to boost learning. However, two key challenges have yet to be well addressed in the existing TSC methods: (1) The construction and storage of high-order affinity tensors to encode the multi-wise similarities are memory-intensive and hampers their applicability, and (2) they mostly employ a two-stage approach that integrates multiple affinity tensors of different orders to learn a consensus tensor spectral embedding, thus often leading to a suboptimal clustering result. To this end, this paper proposes a tensor spectral clustering network (TSC-Net) to achieve one-stage learning of a consensus tensor spectral embedding, while reducing the memory cost. TSC-Net employs a deep neural network that learns to map the input samples to the consensus tensor spectral embedding, guided by a TSC objective with multiple affinity tensors. It uses stochastic optimization to calculate a small part of the affinity tensors, thereby avoiding loading the whole affinity tensors for computation, thus significantly reducing the memory cost. Through using an ensemble of multiple affinity tensors, the TSC can dramatically improve clustering performance. Empirical studies on benchmark datasets demonstrate that TSC-Net outperforms the recent baseline methods.

show abstract

Section: Tensor Spectral Clusteringmentioning

confidence: 99%

Deep Tensor Spectral Clustering Network via Ensemble of Multiple Affinity Tensors

Cai,

Hu,

et al. 2024

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

show abstract

“…We have shown that high accuracy of clustering can be maintained without having to introduce supervised learning. Paper [24] proposed modeling any high dimensional clustering problem as a one-dimensional analysis of the probability distribution under the assumption that clusters are high-density regions in the feature space separated by relatively low-density neighbors.…”

Section: B Related Workmentioning

confidence: 99%

Expectation Distance-based Distributional Clustering for Noise-Robustness

Adesunkanmi¹,

Kumar²

2021

Preprint

View full text Add to dashboard Cite

This paper presents noise-robust clustering techniques in unsupervised machine learning. The uncertainty about the noise, consistency, and other ambiguities can become severe obstacles in data analytics. As a result, data quality, cleansing, management, and governance remain critical disciplines when working with Big Data. With this complexity, it is no longer sufficient to treat data deterministically as in a classical setting, and it becomes meaningful to account for noise distribution and its impact on data sample values. Classical clustering methods group data into "similarity classes" depending on their relative distances or similarities in the underlying space. This paper addressed this problem via the extension of classical K-means and K-medoids clustering over data distributions (rather than the raw data). This involves measuring distances among distributions using two types of measures: the optimal mass transport (also called Wasserstein distance, denoted W2) and a novel distance measure proposed in this paper, the expected value of random variable distance (denoted ED). The presented distribution-based K-means and K-medoids algorithms cluster the data distributions first and then assign each raw data to the cluster of data's distribution. These noise-robust clustering algorithms have been implemented in MatLab and applied to cluster noisy real-world weather data by efficiently extracting and using underlying uncertainty information (means and variances). The results on weather data show striking improvement in performance for W2 and ED distance-based K-means and K-medoids, and where higher accuracy is observed for ED compared to W2 for both K-means and K-medoids. This is because while W2 works with marginal distributions ignoring the actual correlations in computing the distance measure, ED works with the joint distributions factoring the correlations into the distance measurements.

show abstract

“…However, there is a critical prerequisite in aforementioned methods: the data relationship can be accurately described by pairwise affinity. This can be challenging in real applications, especially for highdimension m yet low-sample-size n (HDLSS) data when n ≪ m [15], [16]. The clustering performance of HDLSS data is hindered by the concentration effects, also known as the "curse of dimensionality" [17].…”

Section: Introductionmentioning

confidence: 99%

Multiview Tensor Spectral Clustering via Co-regularization

Cai,

Wang,

et al. 2024

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Graph-based multi-view clustering encodes multi-view data into sample affinities to find consensus representation, effectively overcoming heterogeneity across different views. However, traditional affinity measures tend to collapse as the feature dimension expands, posing challenges in estimating a unified alignment that reveals both crossview and inner relationships. To tackle this challenge, we propose to achieve multi-view uniform clustering via consensus representation coregularization. First, the sample affinities are encoded by both popular dyadic affinity and recent high-order affinities to comprehensively characterize spatial distributions of the HDLSS data. Second, a fused consensus representation is learned through aligning the multi-view lowdimensional representation by co-regularization. The learning of the fused representation is modeled by a high-order eigenvalue problem within manifold space to preserve the intrinsic connections and complementary correlations of original data. A numerical scheme via manifold minimization is designed to solve the high-order eigenvalue problem efficaciously. Experiments on eight HDLSS datasets demonstrate the effectiveness of our proposed method in comparison with the recent thirteen benchmark methods.

show abstract

Clustering Through Probability Distribution Analysis Along Eigenpaths

Cited by 7 publications

References 24 publications

Deep Tensor Spectral Clustering Network via Ensemble of Multiple Affinity Tensors

Deep Tensor Spectral Clustering Network via Ensemble of Multiple Affinity Tensors

Expectation Distance-based Distributional Clustering for Noise-Robustness

Multiview Tensor Spectral Clustering via Co-regularization

Contact Info

Product

Resources

About