Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2016
DOI: 10.18653/v1/p16-2062
|View full text |Cite
|
Sign up to set email alerts
|

A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings

Abstract: Uncovering thematic structures of SNS and blog posts is a crucial yet challenging task, because of the severe data sparsity induced by the short length of texts and diverse use of vocabulary. This hinders effective topic inference of traditional LDA because it infers topics based on document-level co-occurrence of words. To robustly infer topics in such contexts, we propose a latent concept topic model (LCTM). Unlike LDA, LCTM reveals topics via co-occurrence of latent concepts, which we introduce as latent va… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
28
1

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 31 publications
(30 citation statements)
references
References 10 publications
1
28
1
Order By: Relevance
“…The models that incorporate word embeddings, namely LFLDA and LCTM, show inconsistent performance over the two datasets. Different to what is reported in [12], we found that LCTM performs worse than LFLDA in half of the cases 5 , potentially caused by the noisy nature of tweets and its adverse effect on constructing latent concepts. In general the two online models perform reasonably well for this task.…”
Section: Tweet Clustering Evaluationcontrasting
confidence: 99%
See 1 more Smart Citation
“…The models that incorporate word embeddings, namely LFLDA and LCTM, show inconsistent performance over the two datasets. Different to what is reported in [12], we found that LCTM performs worse than LFLDA in half of the cases 5 , potentially caused by the noisy nature of tweets and its adverse effect on constructing latent concepts. In general the two online models perform reasonably well for this task.…”
Section: Tweet Clustering Evaluationcontrasting
confidence: 99%
“…[18] propose to incorporate word embeddings through the generalised Pólya urn model in topic inference. [12] propose to infer topics via document-level cooccurrence patterns of latent concepts instead of words themselves. All of these approaches aim to improve topic coherence by connecting semantically related words to overcome the short length of tweets.…”
Section: Related Workmentioning
confidence: 99%
“…To establish the connection between topics, reviewer profiles and submissions, we assume that the topic vectors can be written as a linear combination of the embeddings of component words in either the reviewer profiles or the submissions. This assumption is supported by the geometric property of word embeddings that the weighted sum of the component word embeddings have been shown to be a robust and efficient representation of sentences and documents (Mikolov et al, 2013b). Intuitively, the extracted common topics would be highly correlated with the subset of the words in the reviewer profile or that of the submission in terms of semantic similarity.…”
Section: Modelingmentioning
confidence: 99%
“…One limitation of LDA is that it does not make use of the potential semantic relatedness between words in a topic because of its assumption that words are generated independently (Xie et al, 2015). Variants of LDA have been proposed to incorporate notions of semantic coherence for more effective topic modeling (Hu and Tsujii, 2016;Das et al, 2015;Xun et al, 2017). Beyond having probabilistic models for topics, Jin et al sought to capture the temporal changes of reviewer interest as well as the stability of their interest trend with probabilistic modeling (Jin et al, 2017).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation