2021
DOI: 10.1017/pan.2021.15
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Label Prediction for Political Text-as-Data

Abstract: Political scientists increasingly use supervised machine learning to code multiple relevant labels from a single set of texts. The current “best practice” of individually applying supervised machine learning to each label ignores information on inter-label association(s), and is likely to under-perform as a result. We introduce multi-label prediction as a solution to this problem. After reviewing the multi-label prediction framework, we apply it to code multiple features of (i) access to information requests m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

1
1
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 50 publications
1
1
0
Order By: Relevance
“…Overall, we find that transformer classification models can tackle coding schemes of varying complexity well. In line with some recent research, we do find that it can be beneficial to use supervised machine learning models designed for category co-occurrence when working with particularly complex coding schemes (Erlich et al, 2022 ). Other methods, including dictionaries, logistic regression, and even zero-shot classification, tend to capture co-occurrence patterns less well.…”
Section: Conclusion and Final Remarkssupporting
confidence: 88%
See 1 more Smart Citation
“…Overall, we find that transformer classification models can tackle coding schemes of varying complexity well. In line with some recent research, we do find that it can be beneficial to use supervised machine learning models designed for category co-occurrence when working with particularly complex coding schemes (Erlich et al, 2022 ). Other methods, including dictionaries, logistic regression, and even zero-shot classification, tend to capture co-occurrence patterns less well.…”
Section: Conclusion and Final Remarkssupporting
confidence: 88%
“… 7 We train separate RF and SVM models for each coding category of interest in applications where categories can co-occur across texts. For a comprehensive overview of other solutions for tackling co-occurrence with SML algorithms, see Erlich and colleagues ( 2022 ). …”
mentioning
confidence: 99%
“…Usually, researchers give these texts multiple labels in line with the above hierarchical classes via time-consuming manual efforts before analyzing this information and extracting knowledge about poverty governance [6,7]. Therefore, a classification model based on natural language processing is the primary method for automatic multi-label classification [8].…”
Section: Introductionmentioning
confidence: 99%