2010
DOI: 10.1145/1658377.1658381
|View full text |Cite
|
Sign up to set email alerts
|

Learning author-topic models from text corpora

Abstract: We propose a new unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a two-stage stochastic process. An author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words. The probability distribution over topics in a multi-author paper is a mixture of the distributions associated with the authors. The topic-word and author-topic distr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
195
0
3

Year Published

2012
2012
2017
2017

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 272 publications
(198 citation statements)
references
References 43 publications
0
195
0
3
Order By: Relevance
“…Many topic models and related studies have been proposed [19,26,28], where they are mainly motivated from the probabilistic latent semantic analysis (PLSA) model [14] or the latent Dirichlet allocation (LDA) model [3]. For instance, some models extract topics from the perspectives of authors Mimno and McCallum [20], Rosen-Zvi et al [24], Steyvers et al [25]. These models commonly assume that authors have topic distributions.…”
Section: Introductionmentioning
confidence: 99%
“…Many topic models and related studies have been proposed [19,26,28], where they are mainly motivated from the probabilistic latent semantic analysis (PLSA) model [14] or the latent Dirichlet allocation (LDA) model [3]. For instance, some models extract topics from the perspectives of authors Mimno and McCallum [20], Rosen-Zvi et al [24], Steyvers et al [25]. These models commonly assume that authors have topic distributions.…”
Section: Introductionmentioning
confidence: 99%
“…This facilitates the comparison of the results (Some principles for choosing the appropriate number of the topics were discussed in [4]). We follow the suggestions from [9] and set α = 50/T, and β = 0.01. In Figure 4, x-axis denotes the threshold for match and y-axis denotes the degree of match between real and discovered topics, as defined in Section 3.…”
Section: Resultsmentioning
confidence: 99%
“…In the AT model each word w in a document is associated with two latent parameters: an author x, and a topic z [9].…”
Section: Endformentioning
confidence: 99%
See 2 more Smart Citations