Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1349
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating Topic Quality with Posterior Variability

Abstract: Probabilistic topic models such as latent Dirichlet allocation (LDA) are popularly used with Bayesian inference methods such as Gibbs sampling to learn posterior distributions over topic model parameters. We derive a novel measure of LDA topic quality using the variability of the posterior distributions. Compared to several existing baselines for automatic topic evaluation, the proposed metric achieves state-of-the-art correlations with human judgments of topic quality in experiments on three corpora. 1 We ad… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 16 publications
(14 reference statements)
0
1
0
Order By: Relevance
“…We evaluated our estimator of topic quality on a dataset of articles extracted from the New York Times, which was already analysed by 47 .…”
Section: Dataset and Pre-processingmentioning
confidence: 99%
“…We evaluated our estimator of topic quality on a dataset of articles extracted from the New York Times, which was already analysed by 47 .…”
Section: Dataset and Pre-processingmentioning
confidence: 99%
“…Unsupervised methods typically design features based on the assumption that segments in the same topic are more coherent than those that belong to different topics, such as lexical cohesion (Hearst, 1997;Choi, 2000;Riedl and Biemann, 2012b), topic models (Misra et al, 2009;Riedl and Biemann, 2012a;Jameel and Lam, 2013;Du et al, 2013) and semantic embedding (Glavaš et al, 2016;Solbiati et al, 2021;Xing and Carenini, 2021). In contrast, supervised models can achieve more precise predictions by automatically mining clues of topic shift from large amounts of labeled data, either by classification on the pairs of sentences or chunks (Wang et al, 2017;Lukasik et al, 2020) or sequence labeling on the whole input sequence (Koshorek et al, 2018;Badjatiya et al, 2018;Xing et al, 2020;. However, the memory consumption and efficiency of neural models such as BERT (Kenton and Toutanova, 2019) can be limiting factors for modeling long documents as their length increases.…”
Section: Topic Segmentation Modelsmentioning
confidence: 99%
“…However the fluency of the constructed document is too low so that the semantic information is basically lost. Xing et al (2020) proposed to add the Consecutive Sentence-pair Coherence (CSC) task by computing the cosine similarity as coherence score. But no more incoherent sentence pairs are considered in CSC, except for those located at segment boundaries.…”
Section: Coherence Modelingmentioning
confidence: 99%
See 2 more Smart Citations