Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2015
DOI: 10.3115/v1/n15-1018
|View full text |Cite
|
Sign up to set email alerts
|

TopicCheck: Interactive Alignment for Assessing Topic Model Stability

Abstract: Content analysis, a widely-applied social science research method, is increasingly being supplemented by topic modeling. However, while the discourse on content analysis centers heavily on reproducibility, computer scientists often focus more on scalability and less on coding reliability, leading to growing skepticism on the usefulness of topic models for automated content analysis. In response, we introduce TopicCheck, an interactive tool for assessing topic model stability. Our contributions are threefold. F… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
73
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 49 publications
(81 citation statements)
references
References 32 publications
0
73
0
Order By: Relevance
“…We thus turn to a labeling process mediated by computation known as topic modeling. Topic modeling methods are often used to categorize a set of unlabeled documents into a finite number of coherent topic clusters [7,28,63], in turn helping researchers to navigate a large corpus (HIT documents in our case) [17,18,24,54]. However, unsupervised clustering of unlabeled documents is far from a solved problem [16,18].…”
Section: Types Of Workmentioning
confidence: 99%
“…We thus turn to a labeling process mediated by computation known as topic modeling. Topic modeling methods are often used to categorize a set of unlabeled documents into a finite number of coherent topic clusters [7,28,63], in turn helping researchers to navigate a large corpus (HIT documents in our case) [17,18,24,54]. However, unsupervised clustering of unlabeled documents is far from a solved problem [16,18].…”
Section: Types Of Workmentioning
confidence: 99%
“…In particular, for "spreadout" topics that were inherently difficult to interpret, because their tokens were drawn from a wide variety of Newsgroups (similar to a "fused" topic in Chuang et al (2013b)), we expected the proportion of correct responses to be roughly 1/3 no matter the value of λ used to compute relevance. Similarly, for very "pure" topics, whose tokens were drawn almost exclusively from one Newsgroup, we expected the task to be easy for any value of λ.…”
Section: User Studymentioning
confidence: 99%
“…Chang et al (2009) Ramage et al (2009) assert that "characterizing topics is hard" and describe how using the top-k terms for a given topic might not always be best, but offer few concrete alternatives. AlSumait et al (2009), Mimno et al (2011), and Chuang et al (2013b develop quantitative methods for measuring the interpretability of top-ics based on experiments with data sets that come with some notion of topical ground truth, such as document metadata or expert-created topic labels. These methods are useful for understanding, in a global sense, which topics are interpretable (and why), but they don't specifically attempt to aid the user in interpreting individual topics.…”
Section: Topic Interpretation and Coherencementioning
confidence: 99%
“…Ramage et al (2009) assert that "characterizing topics is hard" and describe how using the top-k terms for a given topic might not always be best, but offer few concrete alternatives. AlSumait et al (2009), and Chuang et al (2013b develop quantitative methods for measuring the interpretability of top-ics based on experiments with data sets that come with some notion of topical ground truth, such as document metadata or expert-created topic labels. These methods are useful for understanding, in a global sense, which topics are interpretable (and why), but they don't specifically attempt to aid the user in interpreting individual topics.…”
Section: Topic Interpretation and Coherencementioning
confidence: 99%