Robbie Haertel scite author profile

Robbie Haertel

5Publications

84Citation Statements Received

30Citation Statements Given

How they've been cited

How they cite others

Affiliations

Brigham Young University

Publications

Order By: Most citations

Assessing the costs of sampling methods in active learning for annotation

Haertel¹,

Ringger²,

Seppi³

et al. 2008

View full text Add to dashboard Cite

Traditional Active Learning (AL) techniques assume that the annotation of each datum costs the same. This is not the case when annotating sequences; some sequences will take longer than others. We show that the AL technique which performs best depends on how cost is measured. Applying an hourly cost model based on the results of an annotation user study, we approximate the amount of time necessary to annotate a given sentence. This model allows us to evaluate the effectiveness of AL sampling methods in terms of time spent in annotation. We acheive a 77% reduction in hours from a random baseline to achieve 96.5% tag accuracy on the Penn Treebank. More significantly, we make the case for measuring cost in assessing AL methods.

show abstract

Active learning for part-of-speech tagging

Ringger¹,

Morales²,

Haertel³

et al. 2007

View full text Add to dashboard Cite

In the construction of a part-of-speech annotated corpus, we are constrained by a fixed budget. A fully annotated corpus is required, but we can afford to label only a subset. We train a Maximum Entropy Markov Model tagger from a labeled subset and automatically tag the remainder. This paper addresses the question of where to focus our manual tagging efforts in order to deliver an annotation of highest quality. In this context, we find that active learning is always helpful. We focus on Query by Uncertainty (QBU) and Query by Committee (QBC) and report on experiments with several baselines and new variations of QBC and QBU, inspired by weaknesses particular to their use in this application. Experiments on English prose and poetry test these approaches and evaluate their robustness. The results allow us to make recommendations for both types of text and raise questions that will lead to further inquiry.

show abstract

Early Gains Matter: A Case for Preferring Generative over Discriminative Crowdsourcing Models

Felt

Black²,

Ringger

et al. 2015

View full text Add to dashboard Cite

In modern practice, labeling a dataset often involves aggregating annotator judgments obtained from crowdsourcing. State-of-theart aggregation is performed via inference on probabilistic models, some of which are dataaware, meaning that they leverage features of the data (e.g., words in a document) in addition to annotator judgments. Previous work largely prefers discriminatively trained conditional models. This paper demonstrates that a data-aware crowdsourcing model incorporating a generative multinomial data model enjoys a strong competitive advantage over its discriminative log-linear counterpart in the typical crowdsourcing setting. That is, the generative approach is better except when the annotators are highly accurate in which case simple majority vote is often sufficient. Additionally, we present a novel mean-field variational inference algorithm for the generative model that significantly improves on the previously reported state-of-the-art for that model. We validate our conclusions on six text classification datasets with both human-generated and synthetic annotations.

show abstract

Most Maya Glyphs Are Written in Ch′olti′an

Law¹,

Robertson

Houston

et al. 2009

View full text Add to dashboard Cite

An Analytic and Empirical Evaluation of Return-on-Investment-Based Active Learning

Haertel

Ringger

Seppi

et al. 2015

View full text Add to dashboard Cite

Return-on-Investment (ROI) is a costconscious approach to active learning (AL) that considers both estimates of cost and of benefit in active sample selection. We investigate the theoretical conditions for successful cost-conscious AL using ROI by examining the conditions under which ROI would optimize the area under the cost/benefit curve. We then empirically measure the degree to which optimality is jeopardized in practice when the conditions are violated. The reported experiments involve an English part-of-speech annotation task. Our results show that ROI can indeed successfully reduce total annotation costs and should be considered as a viable option for machine-assisted annotation. On the basis of our experiments, we make recommendations for benefit estimators to be employed in ROI. In particular, we find that the more linearly related a benefit estimate is to the true benefit, the better the estimate performs when paired in ROI with an imperfect cost estimate. Lastly, we apply our analysis to help explain the mixed results of previous work on these questions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Robbie Haertel

Assessing the costs of sampling methods in active learning for annotation

Active learning for part-of-speech tagging

Early Gains Matter: A Case for Preferring Generative over Discriminative Crowdsourcing Models

Most Maya Glyphs Are Written in Ch′olti′an

An Analytic and Empirical Evaluation of Return-on-Investment-Based Active Learning

Contact Info

Product

Resources

About