Exploring the Efficiency of Batch Active Learning for Human-in-the-Loop Relation Extraction

Lourentzou, Ismini; Gruhl, Daniel; Welch, Steve

doi:10.1145/3184558.3191546

Cited by 14 publications

(5 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Batch‐mode AL is a practical technique where the most informative essays are identified in each training iteration. Batch‐mode AL selection serves as an improvement over single instance selection because by sequentially selecting a single essay in each training iteration, a set of essays can be selected over all of the iterations (Lourentzou et al., 2018). The general workflow for batch‐model AL begins with a given set of training data that contain scores where a model is built and fit to the training data.…”

Section: Active Learning Methodsmentioning

confidence: 99%

Using Active Learning Methods to Strategically Select Essays for Automated Scoring

Firoozi

Mohammadi²,

Gierl

2022

Educational Measurement

View full text Add to dashboard Cite

Research on automated essay scoring has become increasing important because it serves as a method for evaluating students' written-responses at scale. Scalable methods for scoring written responses are needed as students migrate to online learning environments resulting in the need to evaluate large numbers of written-response assessments. The purpose of this study is to describe and evaluate three active learning methods than can be used to minimize the number of essays that must be scored by human raters while still providing the data needed to train a modern automated essay scoring system. The three active learning methods are the uncertainty-based, the topological-based, and the hybrid method. These three methods were used to select essays included as part of the Automated Student Assessment Prize competition that were then classified using a scoring model that was training with the bidirectional encoder representations from transformer language model. All three active learning methods produced strong results, with the topological-based method producing the most efficient classification. Growth rate accuracy was also evaluated. The active learning methods produced different levels of efficiency under different sample size allocations but, overall, all three methods were highly efficient and produced classifications that were similar to one another.

show abstract

Section: Active Learning Methodsmentioning

confidence: 99%

Using Active Learning Methods to Strategically Select Essays for Automated Scoring

Firoozi

Mohammadi²,

Gierl

2022

Educational Measurement

View full text Add to dashboard Cite

show abstract

“…Other works focus on interleaving processes to reduce annotator waiting time in batch active learning [38], and active learning domain adaptation settings by clustering uncertaintyweighted embeddings [39] or by utilizing reinforcement learning [40], Bayesian Optimization [41], and domain similarity metrics [42]. Recent works formulate active learning as a multi-armed bandit problem and select data from a set of candidates in each round [43,44,45].…”

Section: B Learn To Select Datamentioning

confidence: 99%

Task-Driven Privacy-Preserving Data-Sharing Framework for the Industrial Internet

Shojaee

Zeng

Wahed

et al. 2022

2022 IEEE International Conference on Big Data (Big Data)

Self Cite

View full text Add to dashboard Cite

Industrial Internet provides a collaborative computational platform for participating enterprises, allowing the collection of big data for machine learning tasks. Despite the promise of training and deployment acceleration, and the potential to optimize decision-making processes through data-sharing, the adoption of such technologies is impacted by the increasing concerns about information privacy. As enterprises prefer to keep data private, this limits interoperability. While prior work has largely explored privacy-preserving mechanisms, the proposed methods naively average or randomly sample data shared from all participants instead of selecting the most well-suited subsets for a particular downstream learning task. Motivated by the lack of effective data-sharing mechanisms for heterogeneous machine learning tasks in Industrial Internet, we propose PriED, a taskdriven data-sharing framework that selectively fuses shared data and local data from participants to improve supervised learning performance. PriED utilizes privacy-preserving data distillation to facilitate data exchange, and dynamic data selection to optimize downstream machine learning tasks. We demonstrate performance improvements on a real semiconductor manufacturing case study.

show abstract

“…Global methods try to find the most informative set of samples from the whole space directly by solving an optimization problem [20], [24]- [28]. These approaches have mathematically and empirically demonstrated a good performance, however, they do not scale well with big datasets [29]. On the other hand, clusteringbased methods, which are highly scalable, partition either whole [30] or a fraction of (i.e.…”

Section: Related Workmentioning

confidence: 99%

“…The main objective of DS3 is to develop a scalable batchmodel framework for the class-imbalance problem. The success of batch mode active learning (BMAL) depends on selecting representative samples [37] as well as the batch size and total budget constraints [29]. The key question is how to find the most representative samples from both the minority and majority classes to cover the whole uncertain space given the limited budget.…”

Section: A Batch-mode Imbalance Learningmentioning

confidence: 99%

Active Learning Strategy for COVID-19 Annotated Dataset

Nazir

Fajri

2021

IEEE Access

View full text Add to dashboard Cite

Exploring the Efficiency of Batch Active Learning for Human-in-the-Loop Relation Extraction

Cited by 14 publications

References 36 publications

Using Active Learning Methods to Strategically Select Essays for Automated Scoring

Using Active Learning Methods to Strategically Select Essays for Automated Scoring

Task-Driven Privacy-Preserving Data-Sharing Framework for the Industrial Internet

Active Learning Strategy for COVID-19 Annotated Dataset

Contact Info

Product

Resources

About