2015
DOI: 10.1162/tacl_a_00140
|View full text |Cite
|
Sign up to set email alerts
|

Improving Topic Models with Latent Feature Word Representations

Abstract: Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
159
0
3

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 267 publications
(173 citation statements)
references
References 29 publications
(40 reference statements)
0
159
0
3
Order By: Relevance
“…Similarly, "Brisbane" (in Australia) is usually abbreviated to "BNE" and called as "Brissie". As a result, current text mining approaches (e.g., topic modeling [13] [14] and other heuristics [15] [16]) may not gain sufficient statistical signals and mismatch the textual contents of the similar authors. Consequently, the correlation edge weight between the pair of authors will be calculated incorrectly.…”
Section: Challenge 1 (Mismatched Author Contents)mentioning
confidence: 99%
See 1 more Smart Citation
“…Similarly, "Brisbane" (in Australia) is usually abbreviated to "BNE" and called as "Brissie". As a result, current text mining approaches (e.g., topic modeling [13] [14] and other heuristics [15] [16]) may not gain sufficient statistical signals and mismatch the textual contents of the similar authors. Consequently, the correlation edge weight between the pair of authors will be calculated incorrectly.…”
Section: Challenge 1 (Mismatched Author Contents)mentioning
confidence: 99%
“…Vector representation has other types: Paragraph2Vec [37], ConceptVector [30], Category2Vec [38], Prod2Vec [31]. Moreover, [13] includes topic models to collectively generate a word from either Dirichlet multinominal or the embedding module. [38] enriches the embedding with Knowledge Graphs to eliminate ambiguity and improve similarity measures.…”
Section: Word Embeddingmentioning
confidence: 99%
“…Chen et al developed the MDK-LDA variant on LDA which takes into account domain knowledge directly to provide better topic descriptors [7]. Furthermore, approaches that combine word embeddings with topic modeling can be beneficial for learning both models jointly [42], as well as improving topic model representations for short texts through word embeddings [36,43,58], or creating improved word embeddings using LDA [46].…”
Section: Semantic Interactionmentioning
confidence: 99%
“…Specifically, in the document classification task, topics are used as features of documents with values P (t | d). These features are used for training a classifier [7,16,17]. In the document clustering task, each topic is considered a cluster and each document is assigned to its most probable topic [16,18].…”
Section: Evaluating Topic Modelsmentioning
confidence: 99%
“…These features are used for training a classifier [7,16,17]. In the document clustering task, each topic is considered a cluster and each document is assigned to its most probable topic [16,18]. For the analyses in Section 7, following common practice (e.g., [16,19,20]), we use Purity and Normalized Mutual Information in the clustering task, and Accuracy as our prime evaluation metric in the classification task.…”
Section: Evaluating Topic Modelsmentioning
confidence: 99%