2012
DOI: 10.18637/jss.v047.i05
|View full text |Cite
|
Sign up to set email alerts
|

High-Dimensional Bayesian Clustering with Variable Selection: TheRPackagebclust

Abstract: The R package bclust is useful for clustering high-dimensional continuous data. The package uses a parametric spike-and-slab Bayesian model to downweight the effect of noise variables and to quantify the importance of each variable in agglomerative clustering. We take advantage of the existence of closed-form marginal distributions to estimate the model hyper-parameters using empirical Bayes, thereby yielding a fully automatic method. We discuss computational problems arising in implementation of the procedure… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 34 publications
(9 citation statements)
references
References 29 publications
0
9
0
Order By: Relevance
“…To determine the subtypes of LIHC, we performed a Bayesian clustering method with a spike-and-slab hierarchical model, which was suitable for clustering high-dimensional data using the function "bclust" in R package "e1071" [20].…”
Section: Subtypes Of Lihcmentioning
confidence: 99%
“…To determine the subtypes of LIHC, we performed a Bayesian clustering method with a spike-and-slab hierarchical model, which was suitable for clustering high-dimensional data using the function "bclust" in R package "e1071" [20].…”
Section: Subtypes Of Lihcmentioning
confidence: 99%
“…We applied Bayesian agglomerative clustering to find groupings in the infants based on their metabolomics profile. This approach is highly suitable for low-sample-size-high-dimensional data (41) where it is commonly difficult to provide reasonable statistical models (42,43). In contrast to the hierarchical cluster approach where the user has to decide and calculate other metrics such as the silhouette width in order to decide for the optimal grouping, the optimal grouping is returned by the Bayesian clustering procedure.…”
Section: Statistical Considerationsmentioning
confidence: 99%
“…The reason why we chose Bayesian clustering is that it is useful for high-dimensional continuous data (41). This is in contrast to distance-based hierarchical clustering techniques which may fail in high-dimensional settings (42,43).…”
Section: Statistical Considerationsmentioning
confidence: 99%
“…Examples of SEM approaches are introduced and discussed by (Berry, Carlin, Lee, and Müller 2010, chapter 2), Thall, Wathen, Bekele, Champlin, Baker, and Benjamin (2003), and Berry, Broglio, Groshen, and Berry (2013, with providing additional background on these specific SEM implementations. SEM approaches are also implemented in packages by Nia and Davison (2012) and Savage, Cooke, Darkins, and Xu (2018) and have been extended to more specialized applications in fMRI studies (Stocco 2014), modeling clearance rates of parasites in biological organisms (Sharifi-Malvajerdi, Zhu, Fogarty, Fay, Fairhurst, Flegg, Stepniewska, and Small 2019), modeling genomic bifurcations (Campbell and Yau 2017), modeling ChIP-seq data through hidden Ising models (Mo 2018), modeling genome-wide nucleosome positioning with high-throughput short-read data (Samb, Khadraoui, Belleau, Deschênes, Lakhal-Chaieb, and Droit 2015), and modeling cross-study analysis of differential gene expression (Scharpf, Tjelmeland, Parmigiani, and Nobel 2009).…”
Section: The Single-source Exchangeability Modelmentioning
confidence: 99%