2020
DOI: 10.1017/pan.2020.4
|View full text |Cite
|
Sign up to set email alerts
|

Active Learning Approaches for Labeling Text: Review and Assessment of the Performance of Active Learning Approaches

Abstract: Supervised machine learning methods are increasingly employed in political science. Such models require costly manual labeling of documents. In this paper, we introduce active learning, a framework in which data to be labeled by human coders are not chosen at random but rather targeted in such a way that the required amount of data to train a machine learning model can be minimized. We study the benefits of active learning using text data examples. We perform simulation studies that illustrate conditions where… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
38
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 49 publications
(42 citation statements)
references
References 31 publications
(2 reference statements)
3
38
0
1
Order By: Relevance
“…Instead of having to go though millions and millions of tweets, this could thus serve as first rough attempt at separating tweets into hateful and non-hateful tweets. This could, e.g., be combined with the active-labeling approach of Miller, Linder, and Mebane (2020).…”
Section: Conclusion and Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Instead of having to go though millions and millions of tweets, this could thus serve as first rough attempt at separating tweets into hateful and non-hateful tweets. This could, e.g., be combined with the active-labeling approach of Miller, Linder, and Mebane (2020).…”
Section: Conclusion and Discussionmentioning
confidence: 99%
“…Hiring student assistants or engaging other researchers in hand-labeling text is extremely time consuming, in terms of designing the coding scheme, potentially redoing the coding scheme and, of course, the actual task of hand-labeling. Even in the case of so-called "active learning" (Miller, Linder, and Mebane 2020), designing a coding scheme and actively labeling a significant amount of text is still necessary. These costs, furthermore, have conceptual consequences as they limit the complexity of the constructs that are most often assessed with supervised approaches.…”
Section: Supervised Text Classificationmentioning
confidence: 99%
“…The key insight to AFSM is leveraging one of the foremost principles of human-computer interaction: have computers do what they do well, and let humans do what they do well Lazar, Feng, and Hochheiser 2017;Norman and Draper 1986). Drawing inspiration from the literatures on adaptive machine learning and text-as-data (Enamorado 2018;Miller, Linder, and Mebane 2019), we find that although computers can quickly identify large sets of possible matches, only humans can quickly identify whether a proposed match is correct (Mozer et al 2018).…”
Section: An Adaptive Algorithmmentioning
confidence: 99%
“…In addition, there is another method called active learning, which selects the instances to be labeled from all unlabeled data to build the most accurate model. The active learning method consists of an iterative process of two operations: query and oracle (Aggarwal et al, 2014;Miller, 2020). In a query operation, it must decide which data instance should be queried for its label.…”
Section: Literature Reviewmentioning
confidence: 99%
“…As a result, our method can build the best training set with a cost limitation. On the contrary, the goal of traditional active learning is usually to build effective classifiers using the fewest number of instances (Miller, 2020).…”
Section: Literature Reviewmentioning
confidence: 99%