Proceedings of the Linguistic Annotation Workshop on - LAW '07 2007
DOI: 10.3115/1642059.1642075
|View full text |Cite
|
Sign up to set email alerts
|

Active learning for part-of-speech tagging

Abstract: In the construction of a part-of-speech annotated corpus, we are constrained by a fixed budget. A fully annotated corpus is required, but we can afford to label only a subset. We train a Maximum Entropy Markov Model tagger from a labeled subset and automatically tag the remainder. This paper addresses the question of where to focus our manual tagging efforts in order to deliver an annotation of highest quality. In this context, we find that active learning is always helpful. We focus on Query by Uncertainty (Q… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 37 publications
(29 citation statements)
references
References 15 publications
0
29
0
Order By: Relevance
“…The direction that Ringger et al (2007) pursue is perhaps the most similar to ours. They attempt to reduce supervision required for high POS tagging performance based on active learning.…”
Section: Related Workmentioning
confidence: 64%
“…The direction that Ringger et al (2007) pursue is perhaps the most similar to ours. They attempt to reduce supervision required for high POS tagging performance based on active learning.…”
Section: Related Workmentioning
confidence: 64%
“…Active learning (further elaborated on in Section 5) has previously been successfully applied to a number of language technology tasks, including information extraction [26,6], named entity recognition [27,30], text categorization [14,11], and part-of-speech tagging [7,25]. When applicable, the active learning paradigm has the desirable effect of creating high performing classifiers using less data than required by competitive classifiers trained on a random selection of data.…”
Section: The Bootmark Methodsmentioning
confidence: 99%
“…One of the important tasks for the future is the compilation of a part-of-speech annotated corpus, which will allow us to build more robust disambiguation models. The tagger presented in this paper, while imperfect, can be useful in the process of creating such corpus, e.g., by applying it in an active learning scenario [61].…”
Section: Tokenizationmentioning
confidence: 99%