2019
DOI: 10.1007/s11634-019-00356-9
|View full text |Cite
|
Sign up to set email alerts
|

Robust and sparse k-means clustering for high-dimensional data

Abstract: In real-world application scenarios, the identification of groups poses a significant challenge due to possibly occurring outliers and existing noise variables. Therefore, there is a need for a clustering method which is capable of revealing the group structure in data containing both outliers and noise variables without any pre-knowledge. In this paper, we propose a k-means-based algorithm incorporating a weighting function which leads to an automatic weight assignment for each observation. In order to cope w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
45
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 30 publications
(52 citation statements)
references
References 34 publications
0
45
0
Order By: Relevance
“…Estimating the number of clusters in a data set is challenging; however, methods such as the gap statistic can be added to the workflow for choosing the number of clusters ( Tibshirani et al, 2001 ). Additionally, recent approaches to clustering, such as robust (weighted) sparse k -mean clustering, have the advantage of simultaneously identifying clusters and informative features for partitioning the data that can be used in feature selection ( Brodinová et al, 2019 ). Finally, growth mixture models for cluster analysis of longitudinal data may be more suitable for data analysis from studies that include a series of sequential measurements of cortical development ( Wei et al, 2017 ).…”
Section: Discussionmentioning
confidence: 99%
“…Estimating the number of clusters in a data set is challenging; however, methods such as the gap statistic can be added to the workflow for choosing the number of clusters ( Tibshirani et al, 2001 ). Additionally, recent approaches to clustering, such as robust (weighted) sparse k -mean clustering, have the advantage of simultaneously identifying clusters and informative features for partitioning the data that can be used in feature selection ( Brodinová et al, 2019 ). Finally, growth mixture models for cluster analysis of longitudinal data may be more suitable for data analysis from studies that include a series of sequential measurements of cortical development ( Wei et al, 2017 ).…”
Section: Discussionmentioning
confidence: 99%
“…Nonetheless, clustering methods are advancing and therefore we do not advocate that ascendant hierarchical clustering is the only method applied on further datasets. For example, a recent paper advances upon k-means clustering to account for outliers and noise variables (Brodinová et al., 2019). As always with analysis of datasets, it is necessary to explore the available tools to find an appropriate choice.…”
Section: Discussionmentioning
confidence: 99%
“…QE is the average distance between each node and its best matching unit (BMU), while TE measures the wellness of the map structure by calculating the node's first and second BMUs and their position in relation to each other (Villmann et al, 1997;Kohonen, 2001;Breard, 2017). Smaller QE and TE values indicate a better fit of the map itself (Kohonen, 2001;Breard, 2017). Once the SOM has been trained, the data was visualized into a U-matrix (unified distance matrix) along with eight component planes.…”
Section: Self-organizing Mapsmentioning
confidence: 99%
“…In this case, a gap statistic method was used. The gap statistic evaluates the dataset and provides the highest possible number of clusters suitable for the analysis (Tibshirani et al, 2001;Brodinová et al, 2019). After the gap value was calculated, the accurate k value was, then, applied to the k-means method.…”
Section: Principal Component Analysis (Pca) and Cluster Analysismentioning
confidence: 99%