Proceedings of the 26th Annual International Conference on Machine Learning 2009
DOI: 10.1145/1553374.1553378
|View full text |Cite
|
Sign up to set email alerts
|

Incorporating domain knowledge into topic modeling via Dirichlet Forest priors

Abstract: Users of topic modeling methods often have knowledge about the composition of words that should have high or low probability in various topics. We incorporate such domain knowledge using a novel Dirichlet Forest prior in a Latent Dirichlet Allocation framework. The prior is a mixture of Dirichlet tree distributions with special structures. We present its construction, and inference via collapsed Gibbs sampling. Experiments on synthetic and real datasets demonstrate our model’s ability to follow and generalize … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
230
0
1

Year Published

2010
2010
2019
2019

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 332 publications
(236 citation statements)
references
References 10 publications
0
230
0
1
Order By: Relevance
“…Another sort of guidance is to specify which terms should have similar probabilities in one topic (must-link) and which terms should not have similar probabilities in any topic (cannot-link). This kind of prior can be modeled as Dirichlet forest prior, which is discussed in [4].…”
Section: Probabilistic Models With Constraintsmentioning
confidence: 99%
“…Another sort of guidance is to specify which terms should have similar probabilities in one topic (must-link) and which terms should not have similar probabilities in any topic (cannot-link). This kind of prior can be modeled as Dirichlet forest prior, which is discussed in [4].…”
Section: Probabilistic Models With Constraintsmentioning
confidence: 99%
“…While these models were not found to be helpful for document smoothing [124], rich hierarchical topics may be beneficial when combined with the explicit user feedback present in our approach. Our approach could also exploit prior information such as predefined concepts by using topic model variants which can incorporate domain knowledge [24,5].…”
Section: Future Workmentioning
confidence: 99%
“…To the best of our knowledge, this is the first constrained LDA model which can process large scale constraints in the forms of must-links and cannot-links. There are two existing work by Andrzejewski and Zhu [1,2] that are related to the proposed model. However, [1] only considers must-link constraints.…”
Section: Introductionmentioning
confidence: 99%
“…However, [1] only considers must-link constraints. In [2], the number of maximal cliques grow exponentially in the process of encoding constraints. Thus, [2] cannot process a large number of constraints (see Section 2.1).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation