Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing 2015
DOI: 10.18653/v1/d15-1037
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Methods for Incorporating Knowledge into Topic Models

Abstract: Latent Dirichlet allocation (LDA) is a popular topic modeling technique for exploring hidden topics in text corpora. Increasingly, topic modeling needs to scale to larger topic spaces and use richer forms of prior knowledge, such as word correlations or document labels. However, inference is cumbersome for LDA models with prior knowledge. As a result, LDA models that use prior knowledge only work in small-scale scenarios. In this work, we propose a factor graph framework, Sparse Constrained LDA (SC-LDA), for e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
31
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 43 publications
(31 citation statements)
references
References 24 publications
0
31
0
Order By: Relevance
“…More broadly, there are many efforts to improve the semantic interpretability of topic models . In particular, much work has improved topic quality via different priors: Wallach et al show the effectiveness of general asymmetric priors to improve topic quality, Newman et al use an informative prior capturing short range dependencies between words, and Andrzejewski et al use Dirichlet Forest priors to capture corpus structure.…”
Section: Related Workmentioning
confidence: 99%
“…More broadly, there are many efforts to improve the semantic interpretability of topic models . In particular, much work has improved topic quality via different priors: Wallach et al show the effectiveness of general asymmetric priors to improve topic quality, Newman et al use an informative prior capturing short range dependencies between words, and Andrzejewski et al use Dirichlet Forest priors to capture corpus structure.…”
Section: Related Workmentioning
confidence: 99%
“…publication year and publication type) can be included as covariates which inform either document-topic proportions (topic prevalence) or topic-term probabilities (topic content) (Roberts et al, 2014: 5). Topic models with document metadata covariates have been shown to produce more coherent and domain-specific topics (Yang et al, 2015), and to perform better in terms of statistical quantities of interest, e.g. calculated covariate relationships with uncertainty estimates (Roberts et al, 2014).…”
Section: Model Specification and Estimationmentioning
confidence: 99%
“…As the acceptance of topic coherence measures increases as a mean of topic model assessment (Paul and Girju, 2010;Reisinger et al, 2010;Hall et al, 2012), recent research trends focus on proposing fast and efficient models that can be scaled up to big amounts of data (Yang et al, 2015;Nguyen et al, 2015), using the whole text per document for training.…”
Section: Related Workmentioning
confidence: 99%