The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2015
DOI: 10.1016/j.patcog.2015.05.026
|View full text |Cite
|
Sign up to set email alerts
|

Clustering of multivariate binary data with dimension reduction via L1-regularized likelihood maximization

Abstract: a b s t r a c tClustering methods with dimension reduction have been receiving considerable wide interest in statistics lately and a lot of methods to simultaneously perform clustering and dimension reduction have been proposed. This work presents a novel procedure for simultaneously determining the optimal cluster structure for multivariate binary data and the subspace to represent that cluster structure. The method is based on a finite mixture model of multivariate Bernoulli distributions, and each component… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(12 citation statements)
references
References 35 publications
(43 reference statements)
0
12
0
Order By: Relevance
“…This can occur in pangenomics as the discovery rate of new families in the pangenome slightly decreases when new genomes are added. Mathematical solutions to this problem seem to exist [50][51][52] for example via the weighting of genomes (based on their respective contribution to the pangenome diversity) or via sparse partitioning methods. An improvement of NEM should include these solutions and could be a perspective of this work.…”
Section: Issues Resulting From High-dimensional Statistics and Parallmentioning
confidence: 99%
“…This can occur in pangenomics as the discovery rate of new families in the pangenome slightly decreases when new genomes are added. Mathematical solutions to this problem seem to exist [50][51][52] for example via the weighting of genomes (based on their respective contribution to the pangenome diversity) or via sparse partitioning methods. An improvement of NEM should include these solutions and could be a perspective of this work.…”
Section: Issues Resulting From High-dimensional Statistics and Parallmentioning
confidence: 99%
“…Actually, it can be the case in pangenomics as the number of new families added to the pangenome slightly decreases when new genomes are added (see figure 3 in [1]). Mathematical solutions to this issue seem to exist [46,47,48] for example via the weighting of features, corresponding to the weighting of genomes in our case. An improved version of NEM should include this improvement and could be perspective of this work.…”
Section: Issues Resulting From High-dimensional Statisticsmentioning
confidence: 98%
“…Since many attributes are usually statistically irrelevant and independent of true categories, they may be removed or associated with small weights (Graham and Miller 2006;Bouguila 2010). This partially links mixture models with subspace clustering of discrete data (Yamamoto and Hayashi 2015;Chen et al 2016). Since the use of multinomial distributions formally requires an independence of attributes, different smoothing techniques were proposed, such as applying Dirichlet distributions as a prior to the multinomial (Bouguila and ElGuebaly 2009).…”
Section: Model-based Techniquesmentioning
confidence: 99%