Prototype-driven learning for sequence models

Haghighi, Aria; Klein, Dan

doi:10.3115/1220835.1220876

Cited by 80 publications

(104 citation statements)

References 11 publications

Supporting

Mentioning

102

Contrasting

Order By: Relevance

“…Despite much previous work (Smith and Eisner, 2005;Johnson, 2007;Toutanova and Johnson, 2007;Haghighi and Klein, 2006;Berg-Kirkpatrick et al, 2010), results on this task are complicated by varying assumptions and unclear evaluation metrics (Christodoulopoulos et al, 2010). Perhaps most importantly, they are not good enough to be practical.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Simple Semi-Supervised POS Tagging

Stratos

Collins

2015

Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing

View full text Add to dashboard Cite

We tackle the question: how much supervision is needed to achieve state-of-the-art performance in part-of-speech (POS) tagging, if we leverage lexical representations given by the model of Brown et al. (1992)? It has become a standard practice to use automatically induced "Brown clusters" in place of POS tags. We claim that the underlying sequence model for these clusters is particularly well-suited for capturing POS tags. We empirically demonstrate this claim by drastically reducing supervision in POS tagging with these representations. Using either the bit-string form given by the algorithm of Brown et al. (1992) or the (less well-known) embedding form given by the canonical correlation analysis algorithm of Stratos et al. (2014), we can obtain 93% tagging accuracy with just 400 labeled words and achieve state-of-the-art accuracy (> 97%) with less than 1 percent of the original training data.

show abstract

Section: Introductionmentioning

confidence: 99%

“…Perhaps most importantly, they are not good enough to be practical. Even with indirect supervision, for example the prototype-driven method of Haghighi and Klein (2006) which assumes a set of word examples for each tag type, the best perposition accuracy remains in the range of mid-70%.…”

Section: Introductionmentioning

confidence: 99%

Simple Semi-Supervised POS Tagging

Stratos

Collins

2015

Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Similarly, Wu and Srihari [34] assigned labels to unlabeled documents with 'labeled features' and then use these pseudo-examples in conjunction with labeled examples to train a weighted margin Support Vector Machine with regularization. Later Haghighi and Klein [36] explored similar "labeled features" for a "pseudo-example" strategy of training a generative MRF sequence model.…”

Section: Semi-supervised Learning With Labeled Featuresmentioning

confidence: 99%

Semi-Supervised Sequence Labeling with Self-Learned Features

Qi¹,

Kuksa

Collobert³

et al. 2009

2009 Ninth IEEE International Conference on Data Mining

View full text Add to dashboard Cite

Abstract-Typical information extraction (IE) systems can be seen as tasks assigning labels to words in a natural language sequence. The performance is restricted by the availability of labeled words. To tackle this issue, we propose a semisupervised approach to improve the sequence labeling procedure in IE through a class of algorithms with self-learned features (SLF). A supervised classifier can be trained with annotated text sequences and used to classify each word in a large set of unannotated sentences. By averaging predicted labels over all cases in the unlabeled corpus, SLF training builds class label distribution patterns for each word (or word attribute) in the dictionary and re-trains the current model iteratively adding these distributions as extra word features. Basic SLF models how likely a word could be assigned to target class types. Several extensions are proposed, such as learning words' class boundary distributions. SLF exhibits robust and scalable behaviour and is easy to tune. We applied this approach on four classical IE tasks: named entity recognition (German and English), part-of-speech tagging (English) and one gene name recognition corpus. Experimental results show effective improvements over the supervised baselines on all tasks. In addition, when compared with the closely related self-training idea, this approach shows favorable advantages.

show abstract

“…To address this, we propose a novel voting scheme that is inspired by the widely-used 1-to-1 accuracy metric for POS induction (Haghighi and Klein, 2006). This metric maps system tags to gold tags to maximize accuracy with the constraint that each gold tag is mapped to at most once.…”

Section: System Combinationmentioning

confidence: 99%

“…For this reason, EM is typically only used to train log-linear model weights when Z(θ) = 1, e.g., for hidden Markov models, probabilistic context-free grammars, and models composed of locally-normalized log-linear models (Berg-Kirkpatrick et al, 2010), among others. There have been efforts at approximating the summation over elements of X, whether by limiting sequence length (Haghighi and Klein, 2006), only summing over observations in the training data (Riezler, 1999), restricting the observation space based on the task , or using Gibbs sampling to obtain an unbiased sample of the full space (Della Pietra et al, 1997;Rosenfeld, 1997).…”

Section: Em and Contrastive Estimationmentioning

confidence: 99%

Weakly-Supervised Learning with Cost-Augmented Contrastive Estimation

Gimpel¹,

Bansal²

2014

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

We generalize contrastive estimation in two ways that permit adding more knowledge to unsupervised learning. The first allows the modeler to specify not only the set of corrupted inputs for each observation, but also how bad each one is. The second allows specifying structural preferences on the latent variable used to explain the observations. They require setting additional hyperparameters, which can be problematic in unsupervised learning, so we investigate new methods for unsupervised model selection and system combination. We instantiate these ideas for part-of-speech induction without tag dictionaries, improving over contrastive estimation as well as strong benchmarks from the PASCAL 2012 shared task.

show abstract

Prototype-driven learning for sequence models

Cited by 80 publications

References 11 publications

Simple Semi-Supervised POS Tagging

Simple Semi-Supervised POS Tagging

Semi-Supervised Sequence Labeling with Self-Learned Features

Weakly-Supervised Learning with Cost-Augmented Contrastive Estimation

Contact Info

Product

Resources

About