Convergence rates of latent topic models under relaxed identifiability conditions

Wang, Yining

doi:10.1214/18-ejs1516

Cited by 7 publications

(8 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Section: Strict Identifiability Conditionsmentioning

confidence: 99%

“…For LDA and closely related topic models, there is a rich literature investigating identifiability under different assumptions (Anandkumar et al, 2012;Arora et al, 2012;Nguyen, 2015;Wang, 2019). Typically, when there is only one characteristic (p = 1), R ≥ 2 is necessary for identifiability; see Example 2 in Wang (2019). However, there has been limited consideration of identifiability of mixed membership models with multiple characteristics and one replication, i.e., p > 1 and R = 1.…”

Section: Strict Identifiability Conditionsmentioning

confidence: 99%

See 1 more Smart Citation

Dimension-Grouped Mixed Membership Models for Multivariate Categorical Data

Gu,

Erosheva,

et al. 2021

Preprint

View full text Add to dashboard Cite

Mixed Membership Models (MMMs) are a popular family of latent structure models for complex multivariate data. Instead of forcing each subject to belong to a single cluster, MMMs incorporate a vector of subject-specific weights characterizing partial membership across clusters. With this flexibility come challenges in uniquely identifying, estimating, and interpreting the parameters. In this article, we propose a new class of Dimension-Grouped MMMs (Gro-M 3 s) for multivariate categorical data, which improve parsimony and interpretability. In Gro-M 3 s, observed variables are partitioned into groups such that the latent membership is constant for variables within a group but can differ across groups. Traditional latent class models are obtained when all variables are in one group, while traditional MMMs are obtained when each variable is in its own group. The new model corresponds to a novel decomposition of probability tensors. Theoretically, we derive transparent identifiability conditions for both the unknown grouping structure and model parameters in general settings. Methodologically, we propose a Bayesian approach for Dirichlet Gro-M 3 s to inferring the variable grouping structure and estimating model parameters. Simulation results demonstrate good computational performance and empirically confirm the identifiability results. We illustrate the new methodology through an application to a functional disability dataset.

show abstract

Section: Strict Identifiability Conditionsmentioning

confidence: 99%

Section: Strict Identifiability Conditionsmentioning

confidence: 99%

Dimension-Grouped Mixed Membership Models for Multivariate Categorical Data

Gu,

Erosheva,

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Therefore, identifiability can be guaranteed under very mild conditions; for example, one of such conditions is just C being of full rank . Under such Bayesian settings, posterior concentration rates have been established in Nguyen (2015) and Tang et al (2014), and convergence rates for the maximum likelihood estimator (MLE) have been established in Anandkumar et al ( , 2014 and Wang (2019).…”

Section: The Bayesian Approachmentioning

confidence: 99%

“…(A1) is commonly imposed for technical reasons in other related work, such as Nguyen (2015) and Wang (2019), to avoid singularity issues. The geometric interpretation of the assumption in (A2) on W c is that ConvpU 0 q should contain a ball of a constant radius, which is again imposed to avoid singularity issues when a large proportion of the mixing weight vectors are too concentrated.…”

Section: Consistency and Error Analysis Under Fixed Mixing Weightsmentioning

confidence: 99%

See 1 more Smart Citation

Learning Topic Models: Identifiability and Finite-Sample Analysis

Chen¹,

Shishuang²,

Yang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Topic models provide a useful text-mining tool for learning, extracting and discovering latent structures in large text corpora. Although a plethora of methods have been proposed for topic modeling, a formal theoretical investigation on the statistical identifiability and accuracy of latent topic estimation is lacking in the literature. In this paper, we propose a maximum likelihood estimator (MLE) of latent topics based on a specific integrated likelihood, which is naturally connected to the concept of volume minimization in computational geometry. Theoretically, we introduce a new set of geometric conditions for topic model identifiability, which are weaker than conventional separability conditions relying on the existence of anchor words or pure topic documents.We conduct finite-sample error analysis for the proposed estimator and discuss the connection of our results with existing ones. We conclude with empirical studies on both simulated and real datasets.

show abstract

Malaysian Chinese folk beliefs on Facebook based on LDA topic modelling

Hu,

Ho,

Fan

2024

Humanit Soc Sci Commun

View full text Add to dashboard Cite

In the digital age, as social media evolves into a new and significant centre for the dissemination of Chinese folk beliefs, the Malaysian Chinese have actively shared information about these folk beliefs on their social media platforms. The dissemination has transcended regional barriers, encouraging more Malaysian Chinese across various states to actively participate in public discussions on this topic. This study delves into Malaysian Chinese folk beliefs by analysing data from Facebook. A comprehensive examination of 4012 text posts was conducted using the latent Dirichlet allocation (LDA) model for topic modelling. The analysis identified four main themes on social media: ‘Practitioners Worship’, ‘Temple Activities’, ‘Deity Legends’, and ‘Merchandise about Deity Statues’. Based on integrating social construction theory and media ecology theory, the study first explores the varied constructors, including practitioners, temple organisations, media organisations, and merchants. Secondly, Malaysian Chinese folk beliefs on social media present characteristics of utilitarianism, regional diversity, multiple social functions, flowing realms, strong Taoist elements, commercialisation, and a close relationship with the Spring Festival. Furthermore, ‘Safety and Peace’, ‘Pray for Demands’, and ‘Merits and Virtues’ form an interconnected semantic nexus. Hence, the findings theoretically highlight the interaction and significance of social media in the construction and practice of folk beliefs within the Malaysian Chinese community. Practically, this research provides valuable insights into the understanding and dissemination of Malaysian Chinese religious culture in the digital era.

show abstract

Convergence rates of latent topic models under relaxed identifiability conditions

Cited by 7 publications

References 24 publications

Dimension-Grouped Mixed Membership Models for Multivariate Categorical Data

Dimension-Grouped Mixed Membership Models for Multivariate Categorical Data

Learning Topic Models: Identifiability and Finite-Sample Analysis

Malaysian Chinese folk beliefs on Facebook based on LDA topic modelling

Contact Info

Product

Resources

About