MMR-based active machine learning for bio named entity recognition

Kim, Seokhwan; Song, Yu; Kim, Kyungduk; Cha, Jeong-Won; Lee, Gary Geunbae

doi:10.3115/1614049.1614067

Cited by 39 publications

(42 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note, however, that the number of possible labelings grows exponentially with the length of x. To make this feasible, previous work (Kim et al, 2006) has employed an approximation we call N-best sequence entropy (NSE):…”

Section: Uncertainty Samplingmentioning

confidence: 99%

“…A few methods have been proposed, based mostly on the conventions of uncertainty sampling, where the learner queries the instance about which it has the least certainty (Scheffer et al, 2001;Culotta and McCallum, 2005;Kim et al, 2006), or query-by-committee, where a "committee" of models selects the instance about which its members most disagree (Dagan and Engelson, 1995). We provide more detail on these and the new strategies we propose in Section 3.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An analysis of active learning strategies for sequence labeling tasks

Settles

Craven

2008

Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08

778

634

View full text Add to dashboard Cite

Active learning is well-suited to many problems in natural language processing, where unlabeled data may be abundant but annotation is slow and expensive. This paper aims to shed light on the best active learning approaches for sequence labeling tasks such as information extraction and document segmentation. We survey previously used query selection strategies for sequence models, and propose several novel algorithms to address their shortcomings. We also conduct a large-scale empirical comparison using multiple corpora, which demonstrates that our proposed methods advance the state of the art.

show abstract

Section: Uncertainty Samplingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

An analysis of active learning strategies for sequence labeling tasks

Settles

Craven

2008

Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08

778

634

View full text Add to dashboard Cite

show abstract

“…This criterion assumes that the most uncertain sentence is the most useful instance for learning an NER model. There are several ways to implement this, such as least confidence (Culotta and McCallum, 2005)-the lower the probability of a sequence of labels, the less confidence the model, and entropy (Kim et al, 2006) that can measure the uncertainty of a probability distribution. Some other criteria are a diversity measurement (Kim et al, 2006) and a density criterion (Settles and Craven, 2008).…”

Section: Comparison With Baselinesmentioning

confidence: 99%

“…We experimented with the following criteria: least confidence (Culotta and McCallum, 2005), normalized entropy (Kim et al, 2006), MMR (Maximal Marginal Relevance) (Kim et al, 2006), density (Settles and Craven, 2008) when using feature vectors and word embeddings, and the combination of least confidence and density criterion. Equation 8 describes the combination criterion used in our experiments.…”

Section: Active Learning Criteriamentioning

confidence: 99%

Proactive Learning for Named Entity Recognition

Li¹,

Nguyen

Ananiadou

2017

BioNLP 2017

View full text Add to dashboard Cite

The goal of active learning is to minimise the cost of producing an annotated dataset, in which annotators are assumed to be perfect, i.e., they always choose the correct labels. However, in practice, annotators are not infallible, and they are likely to assign incorrect labels to some instances. Proactive learning is a generalisation of active learning that can model different kinds of annotators. Although proactive learning has been applied to certain labelling tasks, such as text classification, there is little work on its application to named entity (NE) tagging. In this paper, we propose a proactive learning method for producing NE annotated corpora, using two annotators with different levels of expertise, and who charge different amounts based on their levels of experience. To optimise both cost and annotation quality, we also propose a mechanism to present multiple sentences to annotators at each iteration. Experimental results for several corpora show that our method facilitates the construction of high-quality NE labelled datasets at minimal cost.

show abstract

“…Kim et al (2006) propose using entropy as a confidence estimator in active learning in CRFs, where examples with the most uncertainty are selected for presentation to humans labelers. In practice, they approximate the entropy of the labels given the N-best labels.…”

Section: Confidence Estimationmentioning

confidence: 99%

Automatic acquisition of grammatical types for nouns

Bel

Espeja

Marimon

2007

Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics

View full text Add to dashboard Cite

The work 1 we present here is concerned with the acquisition of deep grammatical information for nouns in Spanish. The aim is to build a learner that can handle noise, but, more interestingly, that is able to overcome the problem of sparse data, especially important in the case of nouns. We have based our work on two main points. Firstly, we have used distributional evidences as features. Secondly, we made the learner deal with all occurrences of a word as a single complex unit. The obtained results show that grammatical features of nouns is a level of generalization that can be successfully approached with a Decision Tree learner.

show abstract

MMR-based active machine learning for bio named entity recognition

Cited by 39 publications

References 11 publications

An analysis of active learning strategies for sequence labeling tasks

An analysis of active learning strategies for sequence labeling tasks

Proactive Learning for Named Entity Recognition

Automatic acquisition of grammatical types for nouns

Contact Info

Product

Resources

About