Fast Online EM for Big Topic Modeling

Zeng, Jia; Liu, Zhiqiang; Cao, Xiaodan

doi:10.1109/tkde.2015.2492565

Cited by 20 publications

(11 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These paradigms include Expectation-Maximization (EM), online version ( e.g. , Zeng et al., 2016 ) and parallel version ( e.g. , Wang et al., 2015 ).…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Latent topics resonance in scientific literature and commentaries: evidences from natural language processing approach

Wang

Zhou

et al. 2018

Heliyon

View full text Add to dashboard Cite

Resonance is generally used as a metaphor to describe the manner how the information from different sources is combined. Although it is an attractive and fundamental phenomenon in human behavior studies, most studies observed semantic resonances in well-controlled experimental settings at word level. To make up the missing link between word and document level resonances, we devoted our contributions to topic resonances in a novel and natural setting: academic commentaries. Ninety-three academic commentaries from ninety-three authors, along with their references and original papers, are analyzed by a latent Dirichlet allocation based natural language processing approach. This approach can decompose a corpus written and read by an author into several topics with different weights, which can reveal the phenomena ignored at word or document level. We found that (1) topic resonances commonly exist between commenters' fundamental input and output topics; (2) output words are re-allocated by commenters to echo salient input topics; (3) commenters are more prone to associate references which focus on the non-dominant input topics; and (4) topic resonance can even be predicted by a Hebbian-like model which matches the aforementioned findings. These findings will continue to enrich our understanding on the relationship among probe, feedback and context.

show abstract

“…These paradigms include Expectation-Maximization (EM), online version ( e.g. , Zeng et al., 2016 ) and parallel version ( e.g. , Wang et al., 2015 ).…”

Section: Methodsmentioning

confidence: 99%

“…Based on CVB0 ( Teh et al., 2007 ), a stochastic algorithm (SCVB0) was developed to learn human-interpretable topics more accurately and more quickly, both on large and small datasets. SCVB0 has become a standard method whose performance is a benchmark ( Zeng et al., 2016 ).…”

Section: Methodsmentioning

confidence: 99%

Latent topics resonance in scientific literature and commentaries: evidences from natural language processing approach

Wang

Zhou

et al. 2018

Heliyon

View full text Add to dashboard Cite

show abstract

Section: Time and Memory Complexitiesmentioning

confidence: 99%

“…The time and memory complexities have been presented in many topic model publications [23,99,36,42,5,17]. Though the work of [99] provided the most extensive details about time and memory complexities when processing large collections under LDA. Following the work in [99], for D documents containing each N words from a vocabulary of size V , in a particular class c, we obtain a D ⇥ V matrix where NN0 is the total number of nonzero elements in this document-word (sparse) matrix.…”

Section: Time and Memory Complexitiesmentioning

confidence: 99%

“…Though the work of [99] provided the most extensive details about time and memory complexities when processing large collections under LDA. Following the work in [99], for D documents containing each N words from a vocabulary of size V , in a particular class c, we obtain a D ⇥ V matrix where NN0 is the total number of nonzero elements in this document-word (sparse) matrix. During the formation of K topics, it involves placing a K + 1-dimensional variational distribution on every word leading to a K ⇥ NN0 matrix.…”

Section: Time and Memory Complexitiesmentioning

confidence: 99%

See 1 more Smart Citation

Efficient integration of generative topic models into discriminative classifiers using robust probabilistic kernels

Ihou

Bouguila

Bouachir

2020

Pattern Anal Applic

View full text Add to dashboard Cite

A direct implementation of supervised topic modeling using a Naive Bayes classifier is mainly characterized by the formulation of robust generative topic models that utilize prior distributions such as Dirichlet in LDA (latent Dirichlet allocation), where the classification ultimately follows the Bayes theorem. Though, in large scale applications, SVM (support vector machine) seems to outperform Naive Bayes. In this paper, we propose a classification framework that combines the flexibility of the generative topic models and the strong performance of the SVM. We therefore present a generative-discriminative collapsed variational Bayes technique for text documents and visual classification. Our collapsed variational Bayes topic model implements simultaneously two different and asymmetric conjugate priors within the same generative process as it specifically draws the document and corpus parameters using both GD (generalized Dirichlet) and BL (Beta-Liouville) distributions. Each of these flexible priors generalizes the Dirichlet in LDA. The proposed hybrid model results in a much improved inference that contributes to more accurate estimates, coherent (topic) generative features, a robust formulation of probabilistic kernels, and a much improved classification rate. Experiments in image and text documents classification show the merits of the proposed approach.

show abstract

Data Mining

Du¹,

Swamy²

2019

Neural Networks and Statistical Learning

View full text Add to dashboard Cite

Fast Online EM for Big Topic Modeling

Cited by 20 publications

References 26 publications

Latent topics resonance in scientific literature and commentaries: evidences from natural language processing approach

Latent topics resonance in scientific literature and commentaries: evidences from natural language processing approach

Efficient integration of generative topic models into discriminative classifiers using robust probabilistic kernels

Data Mining

Contact Info

Product

Resources

About