Subspace Clustering of Very Sparse High-Dimensional Data

Peng, Hankui; Pavlidis, Nicos G.; Eckley, Idris A.; Tsalamanis, Ioannis

doi:10.1109/bigdata.2018.8622472

Cited by 5 publications

(2 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In many challenging real-world applications involving the grouping of highdimensional data, points from each group (cluster) can be well approximated by a distinct lower dimensional linear subspace. This is the case in gene sequenc-ing (McWilliams and Montana 2014), cancer genomics (Yeoh et al 2002), face clustering (Elhamifar and Vidal 2013), motion segmentation (Rao et al 2010), and text mining (Peng et al 2018). The problem of simultaneously estimating the linear subspace corresponding to each cluster, and assigning each point to the closest subspace is known as subspace clustering (Vidal 2011).…”

Section: Introductionmentioning

confidence: 99%

Weighted sparse simplex representation: a unified framework for subspace clustering, constrained clustering, and active learning

Peng

Pavlidis

2022

Data Min Knowl Disc

Self Cite

View full text Add to dashboard Cite

Spectral-based subspace clustering methods have proved successful in many challenging applications such as gene sequencing, image recognition, and motion segmentation. In this work, we first propose a novel spectral-based subspace clustering algorithm that seeks to represent each point as a sparse convex combination of a few nearby points. We then extend the algorithm to a constrained clustering and active learning framework. Our motivation for developing such a framework stems from the fact that typically either a small amount of labelled data are available in advance; or it is possible to label some points at a cost. The latter scenario is typically encountered in the process of validating a cluster assignment. Extensive experiments on simulated and real datasets show that the proposed approach is effective and competitive with state-of-the-art methods.

show abstract

Section: Introductionmentioning

confidence: 99%

Weighted sparse simplex representation: a unified framework for subspace clustering, constrained clustering, and active learning

Peng

Pavlidis

2022

Data Min Knowl Disc

Self Cite

View full text Add to dashboard Cite

show abstract

“…So, most elements are zero in a row. DTM suffers from two problems: sparsity and high dimensionality (Peng et al, 2018). Sparsity means that the number of elements having zero value is more than the number of elements having non-zero value (Karami, 2017).…”

Section: Introductionmentioning

confidence: 99%

Application of Fuzzy Clustering for Text Data Dimensionality Reduction

Karami

2019

IJKEDM

View full text Add to dashboard Cite

Large textual corpora are often represented by the document-term frequency matrix whose elements are the frequency of terms; however, this matrix has two problems: sparsity and high dimensionality. Four dimension reduction strategies are used to address these problems. Of the four strategies, unsupervised feature transformation (UFT) is a popular and efficient strategy to map the terms to a new basis in the document-term frequency matrix. Although several UFTbased methods have been developed, fuzzy clustering has not been considered for dimensionality reduction. This research explores fuzzy clustering as a new UFT-based approach to create a lower-dimensional representation of documents. Performance of fuzzy clustering with and without using global term weighting methods is shown to exceed principal component analysis and singular value decomposition. This study also explores the effect of applying different fuzzifier values on fuzzy clustering for dimensionality reduction purpose.

show abstract