Applying co-training methods to statistical parsing

Sarkar, Anoop

doi:10.3115/1073336.1073359

Cited by 88 publications

(60 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…However, although co-training has been used in many domains such as statistical parsing and noun phrase identification [22], [29], [33], [38], in most scenarios the requirement of sufficient and redundant views, or even the requirement of sufficient redundancy, could not be met. Therefore, researchers attempt to develop variants of the co-training algorithm for relaxing such a requirement.…”

Section: Semi-supervised Learningmentioning

confidence: 99%

Semisupervised Regression with Cotraining-Style Algorithms

Zhou

2007

IEEE Trans. Knowl. Data Eng.

182

View full text Add to dashboard Cite

The traditional setting of supervised learning requires a large amount of labeled training examples in order to achieve good generalization. However, in many practical applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore, semi-supervised learning has attracted much attention. Previous research on semi-supervised learning mainly focuses on semi-supervised classification. Although regression is almost as important as classification, semisupervised regression is largely understudied. In particular, although co-training is a main paradigm in semi-supervised learning, few works has been devoted to co-training style semi-supervised regression algorithms. In this paper, a co-training style semi-supervised regression algorithm, i.e. COREG, is proposed. This algorithm uses two regressors each labels the unlabeled data for the other regressor, where the confidence in labeling an unlabeled example is estimated through the amount of reduction in mean square error over the labeled neighborhood of that example. Analysis and experiments show that COREG can effectively exploit unlabeled data to improve regression estimates.

show abstract

Section: Semi-supervised Learningmentioning

confidence: 99%

Semisupervised Regression with Cotraining-Style Algorithms

Zhou

2007

IEEE Trans. Knowl. Data Eng.

182

View full text Add to dashboard Cite

show abstract

“…Although co-training has already been successfully applied to some fields [20][21] [22], the requirement on two sufficient and redundant attribute subsets might be too strong to be met in many activity recognition systems.…”

Section: Introductionmentioning

confidence: 99%

Activity Recognition Based on Semi-supervised Learning

Guan

Yuan

Lee

et al. 2007

13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2007)

View full text Add to dashboard Cite

show abstract

“…Co-training (Blum and Mitchell, 1998), and several variants of co-training, have been applied to a number of NLP problems, including word sense disambiguation (Yarowsky, 1995), named entity recognition (Collins and Singer, 1999), noun phrase bracketing (Pierce and Cardie, 2001) and statistical parsing (Sarkar, 2001;Steedman et al, 2003). In each case, co-training was used successfully to bootstrap a model from only a small amount of labelled data and a much larger pool of unlabelled data.…”

Section: Introductionmentioning

confidence: 99%

Bootstrapping POS taggers using unlabelled data

Clark

Curran

Osborne

2003

Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 -

View full text Add to dashboard Cite

This paper investigates booststrapping part-ofspeech taggers using co-training, in which two taggers are iteratively re-trained on each other's output. Since the output of the taggers is noisy, there is a question of which newly labelled examples to add to the training set. We investigate selecting examples by directly maximising tagger agreement on unlabelled data, a method which has been theoretically and empirically motivated in the co-training literature. Our results show that agreement-based co-training can significantly improve tagging performance for small seed datasets. Further results show that this form of co-training considerably outperforms self-training. However, we find that simply re-training on all the newly labelled data can, in some cases, yield comparable results to agreement-based co-training, with only a fraction of the computational cost.

show abstract

Applying co-training methods to statistical parsing

Cited by 88 publications

References 24 publications

Semisupervised Regression with Cotraining-Style Algorithms

Semisupervised Regression with Cotraining-Style Algorithms

Activity Recognition Based on Semi-supervised Learning

Bootstrapping POS taggers using unlabelled data

Contact Info

Product

Resources

About