Topic Modeling with Wasserstein Autoencoders

Feng, Nan; Ding, Ran; Nallapati, Ramesh; Xiang, Bing

doi:10.18653/v1/p19-1640

Cited by 83 publications

(77 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…NVI: the VAE‐based approach introduced in Miao et al (2016). Was‐A: The Wasserstein Autoencoder model presented in Nan et al (2019). AVITM: This technique is published in Srivastava and Sutton (2017), using VAE to approximate the LDA process. AATM:The model uses the attention‐based autoencoder technique for topic modelling in (Tian & Fang, 2019).…”

Section: Experiments and Resultsmentioning

confidence: 99%

“…This method is based on probability and word distribution, making it difficult to process short text. A similar approach is presented in Nan, Ding, Nallapati, and Xiang (2019), in which the deep neural network is used to model the Dirichlet process in LDA. Work reported in Zhu, Feng, and Li (2018) introduces an approach that is similar to the clustering process, when an enhanced graph‐based VAE is proposed to capture varied learned words from large corpus and apply it to topic modelling problem.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Nested variational autoencoder for topic modelling on microtexts with word vectors

2020

View full text Add to dashboard Cite

Most of the information on the Internet is represented in the form of microtexts, which are short text snippets such as news headlines or tweets. These sources of information are abundant, and mining these data could uncover meaningful insights. Topic modelling is one of the popular methods to extract knowledge from a collection of documents; however, conventional topic models such as latent Dirichlet allocation (LDA) are unable to perform well on short documents, mostly due to the scarcity of word co‐occurrence statistics embedded in the data. The objective of our research is to create a topic model that can achieve great performances on microtexts while requiring a small runtime for scalability to large datasets. To solve the lack of information of microtexts, we allow our method to take advantage of word embeddings for additional knowledge of relationships between words. For speed and scalability, we apply autoencoding variational Bayes, an algorithm that can perform efficient black‐box inference in probabilistic models. The result of our work is a novel topic model called the nested variational autoencoder, which is a distribution that takes into account word vectors and is parameterized by a neural network architecture. For optimization, the model is trained to approximate the posterior distribution of the original LDA model. Experiments show the improvements of our model on microtexts as well as its runtime advantage.

show abstract

Section: Experiments and Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Nested variational autoencoder for topic modelling on microtexts with word vectors

2020

View full text Add to dashboard Cite

show abstract

“…We use the preprocessed 20Newsgroups of (Srivastava and Sutton, 2017), and preprocessed Grolier and NYTimes of (Wang et al, 2019a). We compare the performance of our model with LDA (Blei et al, 2003), NVDM (Miao et al, 2016), ProdLDA (Srivastava and Sutton, 2017), GraphBTM (Zhu et al, 2018), ATM (Wang et al, 2019a) and W-LDA (Nan et al, 2019) using topic coherence measures (Röder et al, 2015). To quantify the understandability of the extracted topics, a topic coherence measure aggregates the relatedness scores of the topic words (topweighted words) of each topic, where the word relatedness scores are estimated based on word co-occurrence statistics on a large external corpus.…”

Section: Methodsmentioning

confidence: 99%

“…We impose a Dirichlet prior, the conjugate prior of the multinomial distribution, to the latent topic distributions. Following W-LDA (Nan et al, 2019), we achieve this goal by minimizing the Maximum Mean Discrepancy (MMD) (Gretton et al, 2012) between the distribution QẐ of inferred topic distributionsẑ and the Dirichlet prior P Z from which we draw multinomial noises z:…”

Section: Training Objectivementioning

confidence: 99%

“…There are also attempts that directly enforced a Dirichlet prior on the document topics. W-LDA (Nan et al, 2019) models topics in the Wasserstein autoencoders (Tolstikhin et al, 2017) framework and achieves distribution matching by minimizing their Maximum Mean Discrepancy (MMD) (Gretton et al, 2012), while adversarial topic model (Wang et al, 2019a(Wang et al, ,b, 2020 directly generates documents from the Dirichlet prior and such a process is adversarially trained with a discriminator under the framework of Generative Adversarial Network (GAN) (Goodfellow et al, 2014).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Neural Topic Modeling by Incorporating Document Relationship Graph

Zhou

Hu²,

Wang

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Graph Neural Networks (GNNs) that capture the relationships between graph nodes via message passing have been a hot research direction in the natural language processing community. In this paper, we propose Graph Topic Model (GTM), a GNN based neural topic model that represents a corpus as a document relationship graph. Documents and words in the corpus become nodes in the graph and are connected based on document-word cooccurrences. By introducing the graph structure, the relationships between documents are established through their shared words and thus the topical representation of a document is enriched by aggregating information from its neighboring nodes using graph convolution. Extensive experiments on three datasets were conducted and the results demonstrate the effectiveness of the proposed approach.

show abstract

Clustering Approach to Topic Modeling in Users Dialogue

Feldina

Makhnytkina

2020

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

Topic Modeling with Wasserstein Autoencoders

Cited by 83 publications

References 12 publications

Nested variational autoencoder for topic modelling on microtexts with word vectors

Nested variational autoencoder for topic modelling on microtexts with word vectors

Neural Topic Modeling by Incorporating Document Relationship Graph

Clustering Approach to Topic Modeling in Users Dialogue

Contact Info

Product

Resources

About