2016
DOI: 10.1109/tkde.2015.2492565
|View full text |Cite
|
Sign up to set email alerts
|

Fast Online EM for Big Topic Modeling

Abstract: The expectation-maximization (EM) algorithm can compute the maximum-likelihood (ML) or maximum a posterior (MAP) point estimate of the mixture models or latent variable models such as latent Dirichlet allocation (LDA), which has been one of the most popular probabilistic topic modeling methods in the past decade. However, batch EM has high time and space complexities to learn big LDA models from big data streams. In this paper, we present a fast online EM (FOEM) algorithm that infers the topic distribution fro… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
10
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 20 publications
(11 citation statements)
references
References 26 publications
0
10
0
Order By: Relevance
“…These paradigms include Expectation-Maximization (EM), online version ( e.g. , Zeng et al., 2016 ) and parallel version ( e.g. , Wang et al., 2015 ).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…These paradigms include Expectation-Maximization (EM), online version ( e.g. , Zeng et al., 2016 ) and parallel version ( e.g. , Wang et al., 2015 ).…”
Section: Methodsmentioning
confidence: 99%
“…Based on CVB0 ( Teh et al., 2007 ), a stochastic algorithm (SCVB0) was developed to learn human-interpretable topics more accurately and more quickly, both on large and small datasets. SCVB0 has become a standard method whose performance is a benchmark ( Zeng et al., 2016 ).…”
Section: Methodsmentioning
confidence: 99%
“…The time and memory complexities have been presented in many topic model publications [23,99,36,42,5,17]. Though the work of [99] provided the most extensive details about time and memory complexities when processing large collections under LDA.…”
Section: Time and Memory Complexitiesmentioning
confidence: 99%
“…The time and memory complexities have been presented in many topic model publications [23,99,36,42,5,17]. Though the work of [99] provided the most extensive details about time and memory complexities when processing large collections under LDA. Following the work in [99], for D documents containing each N words from a vocabulary of size V , in a particular class c, we obtain a D ⇥ V matrix where NN0 is the total number of nonzero elements in this document-word (sparse) matrix.…”
Section: Time and Memory Complexitiesmentioning
confidence: 99%
See 1 more Smart Citation