2021
DOI: 10.1080/19312458.2021.1920008
|View full text |Cite
|
Sign up to set email alerts
|

Expert-Informed Topic Models for Document Set Discovery

Abstract: The first step in many text-as-data studies is to find documents that address a specific topic within a larger document set. Researchers often rely on simple keyword searches to do this, even though this may introduce considerable selection bias. Such bias may be even greater when researchers lack the domain knowledge required to make informed search decisions, for example, in cross-national research or research on unfamiliar social contexts. We propose expert-informed topic modeling (EITM) as a hybrid approac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
8

Relationship

2
6

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 30 publications
0
4
0
Order By: Relevance
“…The main approach in these studies is reliant on the researchers' secondary data analysis. Although two previous studies (Kaplan and Vakili, 2015;Rinke et al, 2021) engaged with experts, their involvement was limited to providing labels and semantic ratings for their study. In contrast, we contributed to these methods by seeking insights from participants by asking their reasoning for enhanced analysis CGT.…”
Section: Methodological Implicationsmentioning
confidence: 99%
“…The main approach in these studies is reliant on the researchers' secondary data analysis. Although two previous studies (Kaplan and Vakili, 2015;Rinke et al, 2021) engaged with experts, their involvement was limited to providing labels and semantic ratings for their study. In contrast, we contributed to these methods by seeking insights from participants by asking their reasoning for enhanced analysis CGT.…”
Section: Methodological Implicationsmentioning
confidence: 99%
“…Likewise, to build this corpus, an expert survey was conducted among 76 communication and religious studies scholars in the societies, who named relevant debates on the public role of religion and secularism in the respective society and a list of keywords associated with each debate. Based on these keywords, the articles and blog posts were selected in a novel, expert-informed topic modeling process (for detailed information see Rinke et al, 2021). The base corpus and the expert survey results then guided the following four data collection paths:…”
Section: Methodsmentioning
confidence: 99%
“…In total, 76 Facebook pages of partisan collective actors and 41 Facebook pages of alternative media were selected for analysis (Supplemental Appendix B/B-3). We collected all entries posted by these pages in the period of investigation and scored them for subject relevance with topic models that were built from extensive text corpora (Rinke et al, 2021) and that relied on the expert survey keywords. A cut-off for relevance was defined with gold standards of n = 300 comments in each country, each of which was scored by two trained coders with Krippendorff’s α nominal of .78.…”
Section: Methodsmentioning
confidence: 99%
“…In order to draw a representative sample from the Mannheim International News Discourse Data Set (MIND; Rinke et al 2019) for manual content analysis we used the approach of expert-informed topic modeling (EITM; Rinke et al 2021) to identify the population of relevant articles available in a text format. EITM is an efficient approach combining expert domain knowledge and automated classification algorithms to identify and rank articles belonging to a specific master topic in unstructured text corpora.…”
Section: Methodsmentioning
confidence: 99%