2011
DOI: 10.1007/978-3-642-23808-6_31
|View full text |Cite
|
Sign up to set email alerts
|

Larger Residuals, Less Work: Active Document Scheduling for Latent Dirichlet Allocation

Abstract: Abstract. Recently, there have been considerable advances in fast inference for latent Dirichlet allocation (LDA). In particular, stochastic optimization of the variational Bayes (VB) objective function with a natural gradient step was proved to converge and able to process massive document collections. To reduce noise in the gradient estimation, it considers multiple documents chosen uniformly at random. While it is widely recognized that the scheduling of documents in stochastic optimization may have signifi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
1

Year Published

2011
2011
2017
2017

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(20 citation statements)
references
References 15 publications
(18 reference statements)
0
19
1
Order By: Relevance
“…Moreover, we show that the unified EM framework can explain recent LDA inference algorithms like VB [2], GS [5], CVB [7] and BP [8]. Experiments on four big data streams confirm that FOEM is significantly faster and more memoryefficient than the state-of-the-art online LDA algorithms including OGS [11], OVB [12], RVB [13], SOI [14] and SCVB [15]. We anticipate that the proposed FOEM can be also extended to compute ML or MAP estimate of other mixture models and latent variable models [30].…”
Section: Introductionmentioning
confidence: 69%
See 2 more Smart Citations
“…Moreover, we show that the unified EM framework can explain recent LDA inference algorithms like VB [2], GS [5], CVB [7] and BP [8]. Experiments on four big data streams confirm that FOEM is significantly faster and more memoryefficient than the state-of-the-art online LDA algorithms including OGS [11], OVB [12], RVB [13], SOI [14] and SCVB [15]. We anticipate that the proposed FOEM can be also extended to compute ML or MAP estimate of other mixture models and latent variable models [30].…”
Section: Introductionmentioning
confidence: 69%
“…where −w, −d and −(w, d) denote all word indices except w, all document indices except d, and all word indices except {w, d}. After the E-step for each word, the M-step will update the sufficient statistics immediately by adding the updated responsibility µ w,d (k) (13) into (14), (15) and (16).…”
Section: Online Em (Oem) For Ldamentioning
confidence: 99%
See 1 more Smart Citation
“…Then, we turned isLDA into a novel, easy-to-implement oLDA approach, called isoLDA, that scales well to massive and growing datasets by applying influence scheduling to randomly formed batches. Based on the results of the present paper, [8] have recently developed the first active LDA.…”
Section: Resultsmentioning
confidence: 99%
“…Thus, the higher communication rate leads to the larger communication cost in parallel online LDA algorithms. Therefore, it is nontrivial to reduce the communication complexity (5) for parallel online LDA algorithms [11], [12], [21], [27], [28] in order to achieve a better scalability performance. Moreover, not all parallel batch LDA algorithms based on MPA have been proved to converge to the local optimum of the LDA's objective function.…”
Section: Mpamentioning
confidence: 99%