2022
DOI: 10.1109/tpami.2021.3133763
|View full text |Cite
|
Sign up to set email alerts
|

A Variational EM Acceleration for Efficient Clustering at Very Large Scales

Abstract: How can we efficiently find very large numbers of clusters C in very large datasets N of potentially high dimensionality D? Here we address the question by using a novel variational approach to optimize Gaussian mixture models (GMMs) with diagonal covariance matrices. The variational method approximates expectation maximization (EM) by applying truncated posteriors as variational distributions and partial E-steps in combination with coresets. Run time complexity to optimize the clustering objective then reduce… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 49 publications
0
3
0
Order By: Relevance
“…This paper introduced a novel clustering algorithm based on GMMs and tailored for large streams of events. The speed and computational efficiency of the proposed algorithm are significantly increased without adding complex extraneous computation unlike other efficient approximations found in the literature [41]. The algorithm is also more stable than k-means in recovering cluster centers which is important for reproducibility.…”
Section: Discussionmentioning
confidence: 93%
See 1 more Smart Citation
“…This paper introduced a novel clustering algorithm based on GMMs and tailored for large streams of events. The speed and computational efficiency of the proposed algorithm are significantly increased without adding complex extraneous computation unlike other efficient approximations found in the literature [41]. The algorithm is also more stable than k-means in recovering cluster centers which is important for reproducibility.…”
Section: Discussionmentioning
confidence: 93%
“…As datasets grow larger, it is increasingly difficult to apply traditional clustering methods and improvements in computational complexity are all the more necessary. Novel algorithms for training mixture models with increased efficiency [40,41] are particularly promising for learning from large datasets.…”
Section: Gmms and Eventsmentioning
confidence: 99%
“…The EM algorithm has an O(nkd) complexity per iteration [ 9 ]. It can be updated on a regular basis, but this does not lead to an explosion of the computation time considering the number of patients entering the intensive care unit for COVID-19.…”
Section: Resultsmentioning
confidence: 99%
“…Several papers propose so-called generalized EM algorithms [34,42,46,52]. The key idea of these generalizations is to replace the maximization steps ( 3) and ( 4) by increase steps.…”
Section: Remark 3 (Generalized Em Algorithms)mentioning
confidence: 99%