Bootstrapping POS taggers using unlabelled data

Clark, Stephen; Curran, James; Osborne, Miles

doi:10.3115/1119176.1119183

Cited by 75 publications

(43 citation statements)

References 13 publications

(18 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…POS taggers that were experimented are summarized in the following: When merging human-labeled data and auto-tagged data in the data combination, we simply gave our human-labeled training data a relative weight of one. Such results coincide with previous work on self-training for POS tagging (Clark et al 2003). We evaluated POS taggers on the English and Chinese test sets by using the metrics of per-token accuracy as well as parsing accuracy of the baseline parser.…”

Section: Improved Pos Taggingsupporting

confidence: 83%

See 1 more Smart Citation

Improving shift-reduce constituency parsing with large-scale unlabeled data

Zhu¹,

Zhu²,

Wang³

2013

Nat. Lang. Eng.

View full text Add to dashboard Cite

Shift-reduce parsing has been studied extensively for diverse grammars due to the simplicity and running efficiency. However, in the field of constituency parsing, shift-reduce parsers lag behind state-of-the-art parsers. In this paper we propose a semi-supervised approach for advancing shift-reduce constituency parsing. First, we apply the uptraining approach (Petrov, to improve part-of-speech taggers to provide better part-of-speech tags to subsequent shift-reduce parsers. Second, we enhance shift-reduce parsing models with novel features that are defined on lexical dependency information. Both stages depend on the use of large-scale unlabeled data. Experimental results show that the approach achieves overall improvements of 1.5 percent and 2.1 percent on English and Chinese data respectively. Moreover, the final parsing accuracies reach 90.9 percent and 82.2 percent respectively, which are comparable with the accuracy of state-of-the-art parsers. † Corresponding author.

show abstract

Section: Improved Pos Taggingsupporting

confidence: 83%

“…More recently, alternative methods based on system combination were proposed. Clark, Curran and Osborne (2003) adopted the self-training approach and achieved positive results only when human-labeled data are limited. Sφgaard (2010) studied system combination in a tri-training framework.…”

Section: Related Workmentioning

confidence: 99%

Improving shift-reduce constituency parsing with large-scale unlabeled data

Zhu¹,

Zhu²,

Wang³

2013

Nat. Lang. Eng.

View full text Add to dashboard Cite

show abstract

“…In practice, one has to start somewhere, so an initial annotation is first obtained independently from the detector; a detector is then trained with the annotation and used as an annotator itself to refine the annotation, which in turn leads to the training of an improved detector. In essence, WSL for object detection is similar to self-training [4] although the training data is not completely unlabelled. It thus suffers from the model drift problem, that is, when the initial annotation is inaccurate, or wrong annotations are introduced in the iterative learning process, the model can drift away quickly.…”

Section: Introductionmentioning

confidence: 99%

Weakly supervised object detector learning with model drift detection

Siva

Xiang

2011

2011 International Conference on Computer Vision

143

226

View full text Add to dashboard Cite

A conventional approach to learning object detectors uses fully supervised learning techniques which assumes that a training image set with manual annotation of object bounding boxes are provided. The manual annotation of objects in large image sets is tedious and unreliable. Therefore, a weakly supervised learning approach is desirable, where the training set needs only binary labels regarding whether an image contains the target object class. In the weakly supervised approach a detector is used to iteratively annotate the training set and learn the object model. We present a novel weakly supervised learning framework for learning an object detector. Our framework incorporates a new initial annotation model to start the iterative learning of a detector and a model drift detection method that is able to detect and stop the iterative learning when the detector starts to drift away from the objects of interest. We demonstrate the effectiveness of our approach on the challenging PASCAL 2007 dataset.

show abstract

“…To overcome these issues, other techniques are used, namely: unsupervised strategies where no data is labeled and all annotations are discovered [21], and semi-supervised learning paradigms, where labeled data are used to annotate unlabeled data. Examples of these techniques include self-training [11,43] and co-training [6]. Active learning, which can be seen as an interactive semisupervised technique, is also used to reduce annotation cost [35,36].…”

Section: Introductionmentioning

confidence: 99%

Using confidence and informativeness criteria to improve POS-tagging in amazigh

Outahajala

Benajiba²,

Rosso

et al. 2015

Journal of Intelligent &Amp; Fuzzy Systems

View full text Add to dashboard Cite

Amazigh is used by tens of millions of people mainly for oral communication. However, and like all the newly investigated languages in natural language processing, it is resource-scarce. The main aim of this paper is to present our POS taggers results based on two state of the art sequence labeling techniques, namely Conditional Random Fields and Support Vector Machines, by making use of a small manually annotated corpus of only 20k tokens. Since creating labeled data is very timeconsuming task while obtaining unlabeled data is less so, we have decided to gather a set of unlabeled data of Amazigh language that we have preprocessed and tokenized. The paper is also meant to address using semi-supervised techniques to improve POS tagging accuracy. An adapted self training algorithm, combining confidence measure with a function of Out Of Vocabulary words to select data for self training, has been used. Using this language independent method, we have managed to obtain encouraging results.

show abstract

Bootstrapping POS taggers using unlabelled data

Cited by 75 publications

References 13 publications

Improving shift-reduce constituency parsing with large-scale unlabeled data

Improving shift-reduce constituency parsing with large-scale unlabeled data

Weakly supervised object detector learning with model drift detection

Using confidence and informativeness criteria to improve POS-tagging in amazigh

Contact Info

Product

Resources

About