Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2014
DOI: 10.3115/v1/p14-1099
|View full text |Cite
|
Sign up to set email alerts
|

A Provably Correct Learning Algorithm for Latent-Variable PCFGs

Abstract: We introduce a provably correct learning algorithm for latent-variable PCFGs. The algorithm relies on two steps: first, the use of a matrix-decomposition algorithm applied to a co-occurrence matrix estimated from the parse trees in a training sample; second, the use of EM applied to a convex objective derived from the training samples in combination with the output from the matrix decomposition. Experiments on parsing and a language modeling problem show that the algorithm is efficient and effective in practic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
16
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(17 citation statements)
references
References 23 publications
1
16
0
Order By: Relevance
“…Notably, another class of methods, based on subspace identification (Overschee and Moor, 1996) and observable operator models/multiplicity automata (Schützenberger, 1961;Jaeger, 2000;Littman et al, 2001), have been proposed for a number of latent variable models. These methods were successfully developed for HMMs by , and subsequently generalized and extended for a number of related sequential and tree Markov models models Bailly, 2011;Parikh et al, 2011;Rodu et al, 2013;Balle and Mohri, 2012), as well as certain classes of parse tree models (Luque et al, 2012;Cohen et al, 2012;Dhillon et al, 2012). These methods use low-order moments to learn an "operator" representation of the distribution, which can be used for density estimation and belief state updates.…”
Section: Latent Variable Modelsmentioning
confidence: 99%
“…Notably, another class of methods, based on subspace identification (Overschee and Moor, 1996) and observable operator models/multiplicity automata (Schützenberger, 1961;Jaeger, 2000;Littman et al, 2001), have been proposed for a number of latent variable models. These methods were successfully developed for HMMs by , and subsequently generalized and extended for a number of related sequential and tree Markov models models Bailly, 2011;Parikh et al, 2011;Rodu et al, 2013;Balle and Mohri, 2012), as well as certain classes of parse tree models (Luque et al, 2012;Cohen et al, 2012;Dhillon et al, 2012). These methods use low-order moments to learn an "operator" representation of the distribution, which can be used for density estimation and belief state updates.…”
Section: Latent Variable Modelsmentioning
confidence: 99%
“…Besides sequential models, spectral learning algorithms for tree-like structures appearing in context-free grammatical models and probabilistic graphical models have also been considered (Bailly et al 2010;Parikh et al 2011;Luque et al 2012;Cohen et al 2012;Dhillon et al 2012). In Sect.…”
Section: Related Workmentioning
confidence: 99%
“…At their core, spectral algorithms exploit the conditional independence that L-PCFGs makes to extract the parameters with the latent states (Cohen et al, 2013(Cohen et al, , 2014. More specifically, L-PCFGs assume that an "inside" tree and an "outside" tree, shown in Figure 2 are conditionally independent of each other given the nonterminal and latent state that attaches them to each other.…”
Section: Spectral Learningmentioning
confidence: 99%
“…2), there is also a formulation for the outside algorithm (Cohen et al, 2014). Le and Zuidema (2014) also extended the recursive neural networks mentioned above to make use of the outside tree information.…”
mentioning
confidence: 99%