Incorporating domain knowledge into topic modeling via Dirichlet Forest priors

Andrzejewski, David; Zhu, Xiaojin; Craven, Mark

doi:10.1145/1553374.1553378

Cited by 332 publications

(236 citation statements)

References 10 publications

Supporting

Mentioning

230

Contrasting

Unclassified

Order By: Relevance

“…Another sort of guidance is to specify which terms should have similar probabilities in one topic (must-link) and which terms should not have similar probabilities in any topic (cannot-link). This kind of prior can be modeled as Dirichlet forest prior, which is discussed in [4].…”

Section: Probabilistic Models With Constraintsmentioning

confidence: 99%

Probabilistic Models for Text Mining

Sun

Deng

2012

Mining Text Data

View full text Add to dashboard Cite

A number of probabilistic methods such as LDA, hidden Markov models, Markov random fields have arisen in recent years for probabilistic analysis of text data. This chapter provides an overview of a variety of probabilistic models for text mining. The chapter focuses more on the fundamental probabilistic techniques, and also covers their various applications to different text mining problems. Some examples of such applications include topic modeling, language modeling, document classification, document clustering, and information extraction.

show abstract

Section: Probabilistic Models With Constraintsmentioning

confidence: 99%

Probabilistic Models for Text Mining

Sun

Deng

2012

Mining Text Data

View full text Add to dashboard Cite

show abstract

“…While these models were not found to be helpful for document smoothing [124], rich hierarchical topics may be beneficial when combined with the explicit user feedback present in our approach. Our approach could also exploit prior information such as predefined concepts by using topic model variants which can incorporate domain knowledge [24,5].…”

Section: Future Workmentioning

confidence: 99%

Rapid Exploitation and Analysis of Documents

Buttler¹,

Andrzejewski²,

Stevens³

et al. 2011

View full text Add to dashboard Cite

“…To the best of our knowledge, this is the first constrained LDA model which can process large scale constraints in the forms of must-links and cannot-links. There are two existing work by Andrzejewski and Zhu [1,2] that are related to the proposed model. However, [1] only considers must-link constraints.…”

Section: Introductionmentioning

confidence: 99%

“…However, [1] only considers must-link constraints. In [2], the number of maximal cliques grow exponentially in the process of encoding constraints. Thus, [2] cannot process a large number of constraints (see Section 2.1).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Constrained LDA for Grouping Product Features in Opinion Mining

Zhai

Liu

et al. 2011

Advances in Knowledge Discovery and Data Mining

105

View full text Add to dashboard Cite

Abstract. In opinion mining of product reviews, one often wants to produce a summary of opinions based on product features/attributes. However, for the same feature, people can express it with different words and phrases. To produce an effective summary, these words and phrases, which are domain synonyms, need to be grouped under the same feature. Topic modeling is a suitable method for the task. However, instead of simply letting topic modeling find groupings freely, we believe it is possible to do better by giving it some pre-existing knowledge in the form of automatically extracted constraints. In this paper, we first extend a popular topic modeling method, called LDA, with the ability to process large scale constraints. Then, two novel methods are proposed to extract two types of constraints automatically. Finally, the resulting constrained-LDA and the extracted constraints are applied to group product features. Experiments show that constrained-LDA outperforms the original LDA and the latest mLSA by a large margin.

show abstract

Incorporating domain knowledge into topic modeling via Dirichlet Forest priors

Cited by 332 publications

References 10 publications

Probabilistic Models for Text Mining

Probabilistic Models for Text Mining

Rapid Exploitation and Analysis of Documents

Constrained LDA for Grouping Product Features in Opinion Mining

Contact Info

Product

Resources

About