Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!

Sia, Suzanna; Dalmia, Ayush; Mielke, Sabrina J.

doi:10.18653/v1/2020.emnlp-main.135

Cited by 69 publications

(63 citation statements)

References 19 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The topic of this paper is related to the more fundamental question of how PLMs represent the meaning of complex words in the first place. So far, most studies have focused on methods of representation extraction, using ad-hoc heuristics such as averaging the subword embeddings (Pinter et al, 2020;Sia et al, 2020; or taking the first subword embedding (Devlin et al, 2019;Heinzerling and Strube, 2019;Martin et al, 2020). While not resolving the issue, we lay the theoretical groundwork for more systematic analyses by showing that PLMs can be regarded as serial dual-route models (Caramazza et al, 1988), i.e., the meanings of complex words are either stored or else need to be computed from the subwords.…”

Section: Introductionmentioning

confidence: 99%

Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words

Hofmann¹,

Pierrehumbert²,

Schuze³

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

How does the input segmentation of pretrained language models (PLMs) affect their interpretations of complex words? We present the first study investigating this question, taking BERT as the example PLM and focusing on its semantic representations of English derivatives. We show that PLMs can be interpreted as serial dual-route models, i.e., the meanings of complex words are either stored or else need to be computed from the subwords, which implies that maximally meaningful input tokens should allow for the best generalization on new words. This hypothesis is confirmed by a series of semantic probing tasks on which Del-BERT (Derivation leveraging BERT), a model with derivational input segmentation, substantially outperforms BERT with WordPiece segmentation. Our results suggest that the generalization capabilities of PLMs could be further improved if a morphologically-informed vocabulary of input tokens were used.

show abstract

Section: Introductionmentioning

confidence: 99%

Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words

Hofmann¹,

Pierrehumbert²,

Schuze³

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…News event tracking has also been framed as a non-parametric topic modeling problem (Zhou et al, 2015) and HDPs that share parameters across temporal batches have been used for this task (Beykikhoshk et al, 2018). Dense document representations have been shown to be useful in the parametric variant of our problem, with neural LDA (Dieng et al, 2019a;Keya et al, 2019;Dieng et al, 2019b;Bianchi et al, 2020), temporal topic evolution models (Zaheer et al, 2017;Gupta et al, 2018;Zaheer et al, 2019;Brochier et al, 2020) and embedding space clustering (Momeni et al, 2018;Sia et al, 2020) being some prominent approaches in the literature.…”

Section: Related Workmentioning

confidence: 99%

Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings

Saravanakumar¹,

Ballesteros²,

Chandrasekaran³

et al. 2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

We propose a method for online news stream clustering that is a variant of the nonparametric streaming K-means algorithm. Our model uses a combination of sparse and dense document representations, aggregates document-cluster similarity along these multiple representations and makes the clustering decision using a neural classifier. The weighted document-cluster similarity model is learned using a novel adaptation of the triplet loss into a linear classification objective. We show that the use of a suitable fine-tuning objective and external knowledge in pre-trained transformer models yields significant improvements in the effectiveness of contextual embeddings for clustering. Our model achieves a new state-of-the-art on a standard stream clustering dataset of English documents.

show abstract

“…However our problem setting is completely different, we extract topics from documents in unsupervised way where document links/metadata/labels either don't exist or are not used to extract the topics. Some very recent works use pre-trained BERT (Devlin et al, 2019) either to leverage improved text representations (Bianchi et al, 2020;Sia et al, 2020) or to augment topic model through knowledge distillation (Hoyle et al, 2020a). Zhu et al (2020) and Dieng et al (2020) jointly train words and topics in a shared embedding space.…”

Section: Introductionmentioning

confidence: 99%

TAN-NTM: Topic Attention Networks for Neural Topic Modeling

Panwar¹,

Shailabh²,

Aggarwal³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Topic models have been widely used to learn text representations and gain insight into document corpora. To perform topic discovery, most existing neural models either take document bag-of-words (BoW) or sequence of tokens as input followed by variational inference and BoW reconstruction to learn topic-word distribution. However, leveraging topic-word distribution for learning better features during document encoding has not been explored much. To this end, we develop a framework TAN-NTM, which processes document as a sequence of tokens through a LSTM whose contextual outputs are attended in a topic-aware manner. We propose a novel attention mechanism which factors in topic-word distribution to enable the model to attend on relevant words that convey topic related cues. The output of topic attention module is then used to carry out variational inference. We perform extensive ablations and experiments resulting in ∼ 9 -15 percentage improvement over score of existing SOTA topic models in NPMI coherence on several benchmark datasets -20Newsgroups, Yelp Review Polarity and AGNews. Further, we show that our method learns better latent document-topic features compared to existing topic models through improvement on two downstream tasks: document classification and topic guided keyphrase generation.

show abstract

Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!

Cited by 69 publications

References 19 publications

Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words

Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words

Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings

TAN-NTM: Topic Attention Networks for Neural Topic Modeling

Contact Info

Product

Resources

About