2016
DOI: 10.3102/1076998616631743
|View full text |Cite
|
Sign up to set email alerts
|

A Survey of Popular R Packages for Cluster Analysis

Abstract: Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring datasets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans and hclust functions; the mclust library; the poLCA library; and the clustMD library. The packages/functions cover a variety of cluster analysis methods for continuous data, categorical data or a collection of the two. The co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 35 publications
(29 citation statements)
references
References 15 publications
0
29
0
Order By: Relevance
“…LS was considered a latent variable, that is, not directly observable, and was evaluated by Latent Class Analysis (LCA) [ 37 ]. With the information from the manifested variables, we fit a statistical model that allowed estimating the probability of a given individual belonging to each of the latent variable categories [ 38 ].…”
Section: Methodsmentioning
confidence: 99%
“…LS was considered a latent variable, that is, not directly observable, and was evaluated by Latent Class Analysis (LCA) [ 37 ]. With the information from the manifested variables, we fit a statistical model that allowed estimating the probability of a given individual belonging to each of the latent variable categories [ 38 ].…”
Section: Methodsmentioning
confidence: 99%
“…The analysis classified the entire sample into two clusters (groups); The number of clusters was determined based on the “elbow” method. According to this method, the number of clusters for k-means was chosen by fitting k-means models for a range of consecutive numbers, usually 1 up to some maximum number, and plotting an elbow plot of the total within sum of squares (WSS) value for each number of clusters versus that cluster number ( Flynt and Dean, 2016 ). The value for one cluster was WSS = 3358.3, and the value for two clusters was WSS = 1801.9.…”
Section: Resultsmentioning
confidence: 99%
“…To better understand the adjustment patterns, we classified the participants into clusters using a k-means cluster analysis with the four adjustment subscales. The number of clusters was determined according to the ‘elbow’ method, in which the number of clusters for k-means was chosen by fitting k-means models for a range of consecutive numbers, usually 1 up to some maximum number, and plotting an elbow plot of the total within sum of squares (WSS) value for each number of clusters versus that cluster number ( Flynt and Dean, 2016 ). In addition, cross-tabulations and chi square tests were used to compare the groups on demographic variables.…”
Section: Methodsmentioning
confidence: 99%
“…An example of how the statistical analyses can be conducted includes: (1) subgroup identification across headache categories by means of multidimensional scaling, factor analysis and/or cluster analysis (eg, hierarchical cluster analysis, k -means clustering, latent class analysis, and fuzzy cluster models) [20][21][22] and (2) statistical model fitting, eg, of the biochemical compound concentration as a response variable and the headache category, subgroup and confounding factors (eg, sex and age) as explanatory variables while considering the risk of overfitting. [23][24][25][26] Data dimensionality and complexity should be considered during sample size determination.…”
Section: Subgroup Identification Based On Profilesmentioning
confidence: 99%