Active Learning with Subsequence Sampling Strategy for Sequence Labeling Tasks

Wanvarie, Dittaya; Takamura, Hiroya; Okumura, Manabu

doi:10.5715/jnlp.18.153

Cited by 7 publications

(14 citation statements)

References 12 publications

(16 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This way, the annotators only need to assign types to the chosen subspans without having to read and annotate the full sequence. To cope with the resulting partial annotation of sequences, we apply a constrained version of conditional random fields (CRFs), partial CRFs, during training that only learn from the annotated subspans (Tsuboi et al, 2008;Wanvarie et al, 2011). To evaluate our proposed methods, we conducted simulated active learning experiments on 5 languages: Spanish, Dutch, German, Hindi and Indonesian.…”

Section: Introductionmentioning

confidence: 99%

A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers

Chaudhary¹,

Xie²,

Sheikh³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Most state-of-the-art models for named entity recognition (NER) rely on the availability of large amounts of labeled data, making them challenging to extend to new, lowerresourced languages. However, there are now several proposed approaches involving either cross-lingual transfer learning, which learns from other highly resourced languages, or active learning, which efficiently selects effective training data based on model predictions. This paper poses the question: given this recent progress, and limited human annotation, what is the most effective method for efficiently creating high-quality entity recognizers in under-resourced languages? Based on extensive experimentation using both simulated and real human annotation, we find a dualstrategy approach best, starting with a crosslingual transferred model, then performing targeted annotation of only uncertain entity spans in the target language, minimizing annotator effort. Results demonstrate that cross-lingual transfer is a powerful tool when very little data can be annotated, but an entity-targeted annotation strategy can achieve competitive accuracy quickly, with just one-tenth of training data. The code is publicly available here. 1

show abstract

Section: Introductionmentioning

confidence: 99%

A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers

Chaudhary¹,

Xie²,

Sheikh³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…The main AL works in this latter line of work are (Shen et al, 2004), ) and (Wanvarie et al, 2011). Shen et al (2004) adopted SVMs as learning algorithm and proposed two strategies that combine three criteria, informativeness, representativeness and diversity.…”

Section: Related Workmentioning

confidence: 99%

“…At this point an AL strategy S will select a number of examples B that once labeled will hopefully improve the performance of the next classifier Φ i+1 . Algorithm 1 shows the pool-based AL framework for partially annotated sequences as introduced in (Wanvarie et al, 2011). Differently from AL for fully labeled sequences , thanks to the finer granularity of the partially labeled model, we use the token as basic annotation unit, instead of the entire sequence.…”

Section: Active Learning Strategiesmentioning

confidence: 99%

“…The Minimum Viterbi Probability (MVP) is the base strategy adopted in (Wanvarie et al, 2011). It takes as measure of informativeness the probability of the label chosen by the Viterbi algorithm.…”

Section: Viterbi Strategiesmentioning

confidence: 99%

“…A few works have investigated this idea. For instance, Wanvarie et al (2011) proposed to use Partially-Labeled Conditional Random Fields (PL-CRFs) (Tsuboi et al, 2008), a semi-supervised variation of Conditional Random Fields (CRFs) (Lafferty et al, 2001) able to deal with partially-labeled sequences, thus enabling to adopt as annotation unit single tokens and still learning from full sequences. AL with partially labeled sequences has proven to be effective in substantially reducing the amount of annotated data with respect to common AL approaches (see (Wanvarie et al, 2011)).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

An Experimental Comparison of Active Learning Strategies for Partially Labeled Sequences

Marcheggiani¹,

Artières²

2014

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Active learning (AL) consists of asking human annotators to annotate automatically selected data that are assumed to bring the most benefit in the creation of a classifier. AL allows to learn accurate systems with much less annotated data than what is required by pure supervised learning algorithms, hence limiting the tedious effort of annotating a large collection of data.We experimentally investigate the behavior of several AL strategies for sequence labeling tasks (in a partially-labeled scenario) tailored on Partially-Labeled Conditional Random Fields, on four sequence labeling tasks: phrase chunking, part-of-speech tagging, named-entity recognition, and bioentity recognition.

show abstract

Leveraging Sequential Pattern Information for Active Learning from Sequential Data

Fidalgo-Merino

Gabrielli

Enrico

2021

2020 25th International Conference on Pattern Recognition (ICPR)

View full text Add to dashboard Cite

Active Learning with Subsequence Sampling Strategy for Sequence Labeling Tasks

Cited by 7 publications

References 12 publications

A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers

A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers

An Experimental Comparison of Active Learning Strategies for Partially Labeled Sequences

Leveraging Sequential Pattern Information for Active Learning from Sequential Data

Contact Info

Product

Resources

About