Learning author-topic models from text corpora

Rosen-Zvi, Michal; Chemudugunta, Chaitanya; Griffiths, Thomas; Smyth, Padhraic; Steyvers, Mark

doi:10.1145/1658377.1658381

Cited by 272 publications

(198 citation statements)

References 43 publications

Supporting

Mentioning

195

Contrasting

Unclassified

Order By: Relevance

“…Many topic models and related studies have been proposed [19,26,28], where they are mainly motivated from the probabilistic latent semantic analysis (PLSA) model [14] or the latent Dirichlet allocation (LDA) model [3]. For instance, some models extract topics from the perspectives of authors Mimno and McCallum [20], Rosen-Zvi et al [24], Steyvers et al [25]. These models commonly assume that authors have topic distributions.…”

Section: Introductionmentioning

confidence: 99%

Discovery of topic flows of authors

et al. 2017

View full text Add to dashboard Cite

With an increase in the number of Web documents, the number of proposed methods for knowledge discovery on Web documents have been increased as well. The documents do not always provide keywords or categories, so unsupervised approaches are desirable, and topic modeling is such an approach for knowledge discovery without using labels. Further, Web documents usually have time information such as publish years, so knowledge patterns over time can be captured by incorporating the time information. The temporal patterns of knowledge can be used to develop useful services such as a graph of research trends, finding similar authors (potential co-authors) to a particular author, or finding top researchers about a specific research domain. In this paper, we propose a new topic model, Author Topic-Flow (ATF) model, whose objective is to capture temporal patterns of research interests of authors over time, where each topic is associated with a research domain. putes the temporal patterns of authors by combining the patterns of topics. We believe that such 'indirect' temporal patterns will be poor than the 'direct' temporal patterns of our proposed model. The ATF model allows each author to have a separated variable which models the temporal patterns, so we denote it as 'direct' topic flow. The design of the ATF model is based on the hypothesis that 'direct' topic flows will be better than the 'indirect' topic flows. We prove the hypothesis is true by a structural comparison between the two models and show the effectiveness of the ATF model by empirical results.

show abstract

Section: Introductionmentioning

confidence: 99%

Discovery of topic flows of authors

et al. 2017

View full text Add to dashboard Cite

show abstract

“…This facilitates the comparison of the results (Some principles for choosing the appropriate number of the topics were discussed in [4]). We follow the suggestions from [9] and set α = 50/T, and β = 0.01. In Figure 4, x-axis denotes the threshold for match and y-axis denotes the degree of match between real and discovered topics, as defined in Section 3.…”

Section: Resultsmentioning

confidence: 99%

“…In the AT model each word w in a document is associated with two latent parameters: an author x, and a topic z [9].…”

Section: Endformentioning

confidence: 99%

“…We use different measures to compare our method with the AT model. Perplexity can be used as a measure to indicate the prediction power of the AT and LDA models [9], but here we focus on the quality of the topics in terms of the clustering results rather than the prediction power.…”

Section: Evaluation Criteriamentioning

confidence: 99%

“…One of the advantages of the LDA model is that this generative probabilistic model can be scaled up to introduce more levels of structure for inference [1]. Author-topic (AT) model can be considered as an extension of the LDA model by incorporating a layer of authors [9,10]. It is the first probabilistic model to identify the topics and author-topic relations simultaneously.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Adapting LDA Model to Discover Author-Topic Relations for Email Analysis

Luo

Wang

et al.

Data Warehousing and Knowledge Discovery

View full text Add to dashboard Cite

Measuring author research relatedness: A comparison of word‐based, topic‐based, and author cocitation approaches

Lü

Wolfram

2012

J Am Soc Inf Sci Tec

View full text Add to dashboard Cite

Relationships between authors based on characteristics of published literature have been studied for decades.Author cocitation analysis using mapping techniques has been most frequently used to study how closely two authors are thought to be in intellectual space based on how members of the research community co-cite their works. Other approaches exist to study author relatedness based more directly on the text of their published works. In this study we present static and dynamic wordbased approaches using vector space modeling, as well as a topic-based approach based on latent Dirichlet allocation for mapping author research relatedness. Vector space modeling is used to define an author space consisting of works by a given author. Outcomes for the two word-based approaches and a topic-based approach for 50 prolific authors in library and information science are compared with more traditional author cocitation analysis using multidimensional scaling and hierarchical cluster analysis. The two word-based approaches produced similar outcomes except where two authors were frequent co-authors for the majority of their articles. The topic-based approach produced the most distinctive map.This literature review section covers two parts. The first section reviews existing techniques used for mapping bibliometric units. The second section briefly reviews the relevant models used in the study. It includes an introduction to the essential ideas of the vector space model, how it applies to the current study, and provides a short introduction to the LDA or topic model. Bibliometric Relatedness MeasuresMany bibliometric studies have formulated quantitative measures to map scientific structure at different levels of granularity including authors, articles, and journals. In reviewing visualization studies for knowledge domains, Börner, Chen, and Boyack (2005) categorized relatedness measures into two broad categories: citation linkages and co-occurrence similarities. Within the relatedness measures, five basic approaches were identified: direct citation, cocitation analysis, co-authorship analysis, bibliographic coupling, and co-word analysis.Direct citation. Direct citation accounts for the relatedness between a citing work and a cited work based on citing behavior. This measure is usually asymmetric. Shibata, Kajikawa, Takeda, and Matsushima (2008) explored citation networks for two research domains and divided the networks into clusters in order to identify research fronts. Direct citation has not attracted wide attention. One possible reason may be its requirement for a very long time window to obtain a sufficient linking signal for clustering (

show abstract

Learning author-topic models from text corpora

Cited by 272 publications

References 43 publications

Discovery of topic flows of authors

Discovery of topic flows of authors

Adapting LDA Model to Discover Author-Topic Relations for Email Analysis

Measuring author research relatedness: A comparison of word‐based, topic‐based, and author cocitation approaches

Contact Info

Product

Resources

About