A Provably Correct Learning Algorithm for Latent-Variable PCFGs

Cohen, Shay B.; Collins, Michael

doi:10.3115/v1/p14-1099

Cited by 17 publications

(17 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Notably, another class of methods, based on subspace identification (Overschee and Moor, 1996) and observable operator models/multiplicity automata (Schützenberger, 1961;Jaeger, 2000;Littman et al, 2001), have been proposed for a number of latent variable models. These methods were successfully developed for HMMs by , and subsequently generalized and extended for a number of related sequential and tree Markov models models Bailly, 2011;Parikh et al, 2011;Rodu et al, 2013;Balle and Mohri, 2012), as well as certain classes of parse tree models (Luque et al, 2012;Cohen et al, 2012;Dhillon et al, 2012). These methods use low-order moments to learn an "operator" representation of the distribution, which can be used for density estimation and belief state updates.…”

Section: Latent Variable Modelsmentioning

confidence: 99%

Tensor Decompositions for Learning Latent Variable Models

et al. 2012

View full text Add to dashboard Cite

This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models-including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation-which exploits a certain tensor structure in their loworder observable moments (typically, of second-and third-order). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.

show abstract

Section: Latent Variable Modelsmentioning

confidence: 99%

Tensor Decompositions for Learning Latent Variable Models

et al. 2012

View full text Add to dashboard Cite

show abstract

“…Besides sequential models, spectral learning algorithms for tree-like structures appearing in context-free grammatical models and probabilistic graphical models have also been considered (Bailly et al 2010;Parikh et al 2011;Luque et al 2012;Cohen et al 2012;Dhillon et al 2012). In Sect.…”

Section: Related Workmentioning

confidence: 99%

Spectral learning of weighted automata

et al. 2013

View full text Add to dashboard Cite

In recent years we have seen the development of efficient provably correct algorithms for learning Weighted Finite Automata (WFA). Most of these algorithms avoid the known hardness results by defining parameters beyond the number of states that can be used to quantify the complexity of learning automata under a particular distribution. One such class of methods are the so-called spectral algorithms that measure learning complexity in terms of the smallest singular value of some Hankel matrix. However, despite their simplicity and wide applicability to real problems, their impact in application domains remains marginal to this date. One of the goals of this paper is to remedy this situation by presenting a derivation of the spectral method for learning WFA that-without sacrificing rigor and mathematical elegance-puts emphasis on providing intuitions on the inner workings of the method and does not assume a strong background in formal algebraic methods. In addition, our algorithm overcomes some of the shortcomings of previous work and is able to learn from statistics of substrings. To illustrate the approach we present experiments on a real application of the method to natural language parsing.

show abstract

“…At their core, spectral algorithms exploit the conditional independence that L-PCFGs makes to extract the parameters with the latent states (Cohen et al, 2013(Cohen et al, , 2014. More specifically, L-PCFGs assume that an "inside" tree and an "outside" tree, shown in Figure 2 are conditionally independent of each other given the nonterminal and latent state that attaches them to each other.…”

Section: Spectral Learningmentioning

confidence: 99%

“…2), there is also a formulation for the outside algorithm (Cohen et al, 2014). Le and Zuidema (2014) also extended the recursive neural networks mentioned above to make use of the outside tree information.…”

mentioning

confidence: 99%

Proceedings of the 15th Meeting on the Mathematics of Language

Gierasimczuk

Szymanik

2017

View full text Add to dashboard Cite

The volume contains eleven regular papers and two invited papers. It also includes an abstract of a third invited talk. The regular papers were selected from a total of 23 submissions, using the EasyChair conference management system. The conference benefited from the financial support of the British Logic Colloquium (http://www. blc-logic.org) and of AYLIEN (http://aylien.com), which we gratefully acknowledge.Last but not least, we would like to express our sincere gratitude to all the reviewers for MOL 2017 and to all the people who helped with the local organization. Partee (1986) claimed without proof that the function BE is the only homomorphism that makes the Partee triangle commute. This paper shows that this claim is incorrect unless "homomorphism" is understood as "complete homomorphism." It also shows that BE and A are the inverses of each other on certain natural assumptions. IntroductionIn a famous and influential paper, Partee (1986) discussed type-shifting operators for NP interpretations, including lift, ident and BE: lift = λxλP. P (x), ident = λxλy. [y = x], BE = λPλx. P(λy. [y = x]).She pointed out that these operators satisfy the equality BE • lift = ident, so the following diagram, now often referred to as the Partee triangle, commutes. Partee declared that BE is "natural" because of the following two "facts." Fact 1. BE is a homomorphism from e, t , t to e, t viewed as Boolean structures, i.e., BE(P 1 P 2 ) = BE(P 1 ) BE(P 2 ), BE(P 1 P 2 ) = BE(P 1 ) BE(P 2 ), BE(¬P 1 ) = ¬BE(P 1 ).Fact 2. BE is the unique homomorphism that makes the diagram commute.While Fact 1 is immediate, Fact 2 is not obvious. Partee (1986) nevertheless did not give a proof of Fact 2, but only a note saying, "Thanks to Johan van Benthem for the fact, which he knows how to prove but I don't." Meanwhile, van Benthem (1986) referred to Partee's work and stated Fact 2 on p. 68, but gave no proof either. Despite this quite obscure exposition, because of the classic status of Partee's and van Benthem's work, I suspect that many linguists take Fact 2 for granted while unable to explain it. Not only is this unfortunate, but it is actually as expected, because Fact 2 turns out to be not quite correct unless "homomorphism" is read as "complete homomorphism." The main purpose of this paper is to rectify this detrimental situation.Van Benthem (1986) took the domain of entities to be finite, writing, "Our general feeling is that natural language requires the use of finite models only" (p. 7). Fact 2 is indeed correct on this assumption. However, natural language has predicates like natural number whose extensions are obviously infinite. Also, if we take the domain of portions of matter in the sense of Link (1983) to be a nonatomic join-semilattice, then the domain of entities will surely be infinite, whether countable or uncountable. It is a fact that a single sentence of natural language, albeit only finitely long, can talk about an infinite number of entities, as exemplified in (1).(1) a. Every natural number is odd or even...

show abstract

A Provably Correct Learning Algorithm for Latent-Variable PCFGs

Cited by 17 publications

References 23 publications

Tensor Decompositions for Learning Latent Variable Models

Tensor Decompositions for Learning Latent Variable Models

Spectral learning of weighted automata

Proceedings of the 15th Meeting on the Mathematics of Language

Contact Info

Product

Resources

About