Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - 2003
DOI: 10.3115/1119176.1119183
|View full text |Cite
|
Sign up to set email alerts
|

Bootstrapping POS taggers using unlabelled data

Abstract: This paper investigates booststrapping part-ofspeech taggers using co-training, in which two taggers are iteratively re-trained on each other's output. Since the output of the taggers is noisy, there is a question of which newly labelled examples to add to the training set. We investigate selecting examples by directly maximising tagger agreement on unlabelled data, a method which has been theoretically and empirically motivated in the co-training literature. Our results show that agreement-based co-training c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
40
0

Year Published

2005
2005
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 75 publications
(43 citation statements)
references
References 13 publications
(18 reference statements)
3
40
0
Order By: Relevance
“…POS taggers that were experimented are summarized in the following: When merging human-labeled data and auto-tagged data in the data combination, we simply gave our human-labeled training data a relative weight of one. Such results coincide with previous work on self-training for POS tagging (Clark et al 2003). We evaluated POS taggers on the English and Chinese test sets by using the metrics of per-token accuracy as well as parsing accuracy of the baseline parser.…”
Section: Improved Pos Taggingsupporting
confidence: 83%
See 1 more Smart Citation
“…POS taggers that were experimented are summarized in the following: When merging human-labeled data and auto-tagged data in the data combination, we simply gave our human-labeled training data a relative weight of one. Such results coincide with previous work on self-training for POS tagging (Clark et al 2003). We evaluated POS taggers on the English and Chinese test sets by using the metrics of per-token accuracy as well as parsing accuracy of the baseline parser.…”
Section: Improved Pos Taggingsupporting
confidence: 83%
“…More recently, alternative methods based on system combination were proposed. Clark, Curran and Osborne (2003) adopted the self-training approach and achieved positive results only when human-labeled data are limited. Sφgaard (2010) studied system combination in a tri-training framework.…”
Section: Related Workmentioning
confidence: 99%
“…In practice, one has to start somewhere, so an initial annotation is first obtained independently from the detector; a detector is then trained with the annotation and used as an annotator itself to refine the annotation, which in turn leads to the training of an improved detector. In essence, WSL for object detection is similar to self-training [4] although the training data is not completely unlabelled. It thus suffers from the model drift problem, that is, when the initial annotation is inaccurate, or wrong annotations are introduced in the iterative learning process, the model can drift away quickly.…”
Section: Introductionmentioning
confidence: 99%
“…To overcome these issues, other techniques are used, namely: unsupervised strategies where no data is labeled and all annotations are discovered [21], and semi-supervised learning paradigms, where labeled data are used to annotate unlabeled data. Examples of these techniques include self-training [11,43] and co-training [6]. Active learning, which can be seen as an interactive semisupervised technique, is also used to reduce annotation cost [35,36].…”
Section: Introductionmentioning
confidence: 99%