2020
DOI: 10.1109/access.2020.2973207
|View full text |Cite
|
Sign up to set email alerts
|

Topic Modeling for Short Texts via Word Embedding and Document Correlation

Abstract: Topic modeling is a widely studied foundational and interesting problem in the text mining domains. Conventional topic models based on word co-occurrences infer the hidden semantic structure from a corpus of documents. However, due to the limited length of short text, data sparsity impedes the inference process of conventional topic models and causes unsatisfactory results on short texts. In fact, each short text usually contains a limited number of topics, and understanding semantic content of short text need… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
23
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 35 publications
(25 citation statements)
references
References 40 publications
0
23
0
Order By: Relevance
“…In this work, we leverage the Topically Driven Neural Language Model (TDLM) (Lau et al, 2017) to obtain topic representations, as it can employ pre-trained embeddings which are found to be more suitable for short Twitter comments (Yi et al, 2020). The original model of TDLM applies a Convolutional Neural Network (CNN) over wordembeddings to generate a comment embedding.…”
Section: Combining Topic Model and Hatebertmentioning
confidence: 99%
“…In this work, we leverage the Topically Driven Neural Language Model (TDLM) (Lau et al, 2017) to obtain topic representations, as it can employ pre-trained embeddings which are found to be more suitable for short Twitter comments (Yi et al, 2020). The original model of TDLM applies a Convolutional Neural Network (CNN) over wordembeddings to generate a comment embedding.…”
Section: Combining Topic Model and Hatebertmentioning
confidence: 99%
“…A global and local model GLTM [16] integrates both word embeddings trained from short texts corpus and auxiliary corpus. TRNMF [17] uses word embeddings to generate sentence similarity regularization and integrates with word co-occurrence. CME-DMM [31] is a collaboratively modeling and embedding framework incorporating topic and word embeddings.…”
Section: Related Workmentioning
confidence: 99%
“…But meta-data as auxiliary information is not always avail-able. Recent works prefer to incorporate word embeddings information [14]- [17]. But word embeddings trained from inappropriate auxiliary corpus will lead to poor performance [18].…”
Section: Introductionmentioning
confidence: 99%
“…Jiang et al [18] proposed a novel text classification algorithm, based on the Ant Colony Optimization (ACO). It abused the discreteness of the features of the text document and the value the ACO provides in addressing discrete issues.…”
Section: Related Workmentioning
confidence: 99%