2018 IEEE International Conference on Big Data (Big Data) 2018
DOI: 10.1109/bigdata.2018.8622472
|View full text |Cite
|
Sign up to set email alerts
|

Subspace Clustering of Very Sparse High-Dimensional Data

Abstract: In this paper we consider the problem of clustering collections of very short texts using subspace clustering. This problem arises in many applications such as product categorisation, fraud detection, and sentiment analysis. The main challenge lies in the fact that the vectorial representation of short texts is both high-dimensional, due to the large number of unique terms in the corpus, and extremely sparse, as each text contains a very small number of words with no repetition. We propose a new, simple subspa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…In many challenging real-world applications involving the grouping of highdimensional data, points from each group (cluster) can be well approximated by a distinct lower dimensional linear subspace. This is the case in gene sequenc-ing (McWilliams and Montana 2014), cancer genomics (Yeoh et al 2002), face clustering (Elhamifar and Vidal 2013), motion segmentation (Rao et al 2010), and text mining (Peng et al 2018). The problem of simultaneously estimating the linear subspace corresponding to each cluster, and assigning each point to the closest subspace is known as subspace clustering (Vidal 2011).…”
Section: Introductionmentioning
confidence: 99%
“…In many challenging real-world applications involving the grouping of highdimensional data, points from each group (cluster) can be well approximated by a distinct lower dimensional linear subspace. This is the case in gene sequenc-ing (McWilliams and Montana 2014), cancer genomics (Yeoh et al 2002), face clustering (Elhamifar and Vidal 2013), motion segmentation (Rao et al 2010), and text mining (Peng et al 2018). The problem of simultaneously estimating the linear subspace corresponding to each cluster, and assigning each point to the closest subspace is known as subspace clustering (Vidal 2011).…”
Section: Introductionmentioning
confidence: 99%
“…So, most elements are zero in a row. DTM suffers from two problems: sparsity and high dimensionality (Peng et al, 2018). Sparsity means that the number of elements having zero value is more than the number of elements having non-zero value (Karami, 2017).…”
Section: Introductionmentioning
confidence: 99%