A Neural Generative Model for Joint Learning Topics and Topic-Specific Word Embeddings

Zhu, Lixing; He, Yulan; Zhou, Deyu

doi:10.1162/tacl_a_00326

Cited by 10 publications

(2 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thompson and Mimno (2020) showed that clustering the contextual representations of a given set of words can produce clusters of semantically related words, which were found to be similar in spirit to LDA topics. The idea of learning topic-specific representations of words has been extensively studied in the context of standard word embeddings (Liu et al, 2015;Li et al, 2016;Shi et al, 2017;Zhu et al, 2020). To the best of our knowledge, learning topic-specific word representations using CLMs has not yet been studied.…”

Section: Related Workmentioning

confidence: 99%

Deriving Word Vectors from Contextualized Language Models using Topic-Aware Mention Selection

Wang¹,

Bouraoui²,

Espinosa-Anke³

et al. 2021

Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

View full text Add to dashboard Cite

One of the long-standing challenges in lexical semantics consists in learning representations of words which reflect their semantic properties. The remarkable success of word embeddings for this purpose suggests that highquality representations can be obtained by summarizing the sentence contexts of word mentions. In this paper, we propose a method for learning word representations that follows this basic strategy, but differs from standard word embeddings in two important ways. First, we take advantage of contextualized language models (CLMs) rather than bags of word vectors to encode contexts. Second, rather than learning a word vector directly, we use a topic model to partition the contexts in which words appear, and then learn different topic-specific vectors for each word. Finally, we use a taskspecific supervision signal to make a soft selection of the resulting vectors. We show that this simple strategy leads to high-quality word vectors, which are more predictive of semantic properties than word embeddings and existing CLM-based strategies.

show abstract

Section: Related Workmentioning

confidence: 99%

Deriving Word Vectors from Contextualized Language Models using Topic-Aware Mention Selection

Wang¹,

Bouraoui²,

Espinosa-Anke³

et al. 2021

Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

View full text Add to dashboard Cite

show abstract

“…Some very recent works use pre-trained BERT (Devlin et al, 2019) either to leverage improved text representations (Bianchi et al, 2020;Sia et al, 2020) or to augment topic model through knowledge distillation (Hoyle et al, 2020a). Zhu et al (2020) and Dieng et al (2020) jointly train words and topics in a shared embedding space. However, we train topic-word distribution as part of our model, embed it using word embeddings being learned and use resultant topic embeddings to perform attention over sequentially processed tokens.…”

Section: Introductionmentioning

confidence: 99%

TAN-NTM: Topic Attention Networks for Neural Topic Modeling

Panwar¹,

Shailabh²,

Aggarwal³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Topic models have been widely used to learn text representations and gain insight into document corpora. To perform topic discovery, most existing neural models either take document bag-of-words (BoW) or sequence of tokens as input followed by variational inference and BoW reconstruction to learn topic-word distribution. However, leveraging topic-word distribution for learning better features during document encoding has not been explored much. To this end, we develop a framework TAN-NTM, which processes document as a sequence of tokens through a LSTM whose contextual outputs are attended in a topic-aware manner. We propose a novel attention mechanism which factors in topic-word distribution to enable the model to attend on relevant words that convey topic related cues. The output of topic attention module is then used to carry out variational inference. We perform extensive ablations and experiments resulting in ∼ 9 -15 percentage improvement over score of existing SOTA topic models in NPMI coherence on several benchmark datasets -20Newsgroups, Yelp Review Polarity and AGNews. Further, we show that our method learns better latent document-topic features compared to existing topic models through improvement on two downstream tasks: document classification and topic guided keyphrase generation.

show abstract

Unsupervised Aspect Extraction Algorithm for opinion mining using topic modeling

Pathan

Prakash

2021

Global Transitions Proceedings

View full text Add to dashboard Cite

A Neural Generative Model for Joint Learning Topics and Topic-Specific Word Embeddings

Cited by 10 publications

References 22 publications

Deriving Word Vectors from Contextualized Language Models using Topic-Aware Mention Selection

Deriving Word Vectors from Contextualized Language Models using Topic-Aware Mention Selection

TAN-NTM: Topic Attention Networks for Neural Topic Modeling

Unsupervised Aspect Extraction Algorithm for opinion mining using topic modeling

Contact Info

Product

Resources

About