2022
DOI: 10.48550/arxiv.2203.01570
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

Abstract: A topic model is often formulated as a generative model that explains how each word of a document is generated given a set of topics and document-specific topic proportions. It is focused on capturing the word co-occurrences in a document and hence often suffers from poor performance in analyzing short documents. In addition, its parameter estimation often relies on approximate posterior inference that is either not scalable or suffering from large approximation error. This paper introduces a new topic-modelin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…For a complete picture of the field, readers may refer to the survey by Min et al (2018). We emphasize deep-clustering-based approaches, which attempt to learn the feature representation of the data while simultaneously discovering the underlying clusters: K-means Caron et al, 2018), information maximization (Menapace et al, 2020;Ji et al, 2019;Kim and Ha, 2021;Do et al, 2021), transport alignment (Asano et al, 2019;Caron et al, 2020;Wang et al, 2022), neighborhood-clustering (Xie et al, 2016;Huang et al, 2019;Dang et al, 2021), contrastive learning (Pan and Kang, 2021;Shen et al, 2021), probabilistic approaches Monnier et al, 2020;Falck et al, 2021;Manduchi et al, 2021), and kernel density (Yang and Li, 2021). These works primarily focus on clustering data for downstream tasks for a single domain, whereas our clustering algorithm is designed to cluster the data from multiple domains.…”
Section: Related Workmentioning
confidence: 99%
“…For a complete picture of the field, readers may refer to the survey by Min et al (2018). We emphasize deep-clustering-based approaches, which attempt to learn the feature representation of the data while simultaneously discovering the underlying clusters: K-means Caron et al, 2018), information maximization (Menapace et al, 2020;Ji et al, 2019;Kim and Ha, 2021;Do et al, 2021), transport alignment (Asano et al, 2019;Caron et al, 2020;Wang et al, 2022), neighborhood-clustering (Xie et al, 2016;Huang et al, 2019;Dang et al, 2021), contrastive learning (Pan and Kang, 2021;Shen et al, 2021), probabilistic approaches Monnier et al, 2020;Falck et al, 2021;Manduchi et al, 2021), and kernel density (Yang and Li, 2021). These works primarily focus on clustering data for downstream tasks for a single domain, whereas our clustering algorithm is designed to cluster the data from multiple domains.…”
Section: Related Workmentioning
confidence: 99%
“…PCAE (Tu et al 2023) also proposes a flexible generation of the output by VAE, which shares a similar idea, and we focus on VQ embeddings as well. Attempts to include word embeddings, mostly using GloVe (Pennington, Socher, and Manning 2014), into generative (Petterson et al 2010;Dieng, Ruiz, and Blei 2020;Duan et al 2021) or non-generative (Wang et al 2022;Xu et al 2022;Tu et al 2023) topic modeling frameworks have also demonstrated successfully topic modeling performance. Moreover, utilizing pre-trained language models (PLMs) such as BERT (Devlin et al 2018), RoBERTa (Liu et al 2019), and XLNet (Yang et al 2019) has emerged as a new trend in mining topic models.…”
Section: Related Workmentioning
confidence: 99%