A Spectral Learning Algorithm for Finite State Transducers

Balle, Borja; Quattoni, Ariadna; Carreras, Xavier

doi:10.1007/978-3-642-23780-5_20

Cited by 14 publications

(14 citation statements)

References 22 publications

(43 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the last years multiple spectral learning algorithms have been proposed for a wide range of models. Many of these models deal with data whose nature is eminently sequential, like the work of Bailly et al (2009) on WFA, or other works on particular subclasses of WFA like HMM (Hsu et al 2009) and related extensions (Siddiqi et al 2010;Song et al 2010), Predictive State Representations (PSR) (Boots et al 2011), Finite State Transducers (FST) (Balle et al 2011), and Quadratic Weighted Automata (QWA) (Bailly 2011). Besides direct applications of the spectral algorithm to different classes of sequential models, the method has also been combined with convex optimization algorithms in , Balle and Mohri (2012).…”

Section: Related Workmentioning

confidence: 99%

Spectral learning of weighted automata

et al. 2013

Self Cite

View full text Add to dashboard Cite

In recent years we have seen the development of efficient provably correct algorithms for learning Weighted Finite Automata (WFA). Most of these algorithms avoid the known hardness results by defining parameters beyond the number of states that can be used to quantify the complexity of learning automata under a particular distribution. One such class of methods are the so-called spectral algorithms that measure learning complexity in terms of the smallest singular value of some Hankel matrix. However, despite their simplicity and wide applicability to real problems, their impact in application domains remains marginal to this date. One of the goals of this paper is to remedy this situation by presenting a derivation of the spectral method for learning WFA that-without sacrificing rigor and mathematical elegance-puts emphasis on providing intuitions on the inner workings of the method and does not assume a strong background in formal algebraic methods. In addition, our algorithm overcomes some of the shortcomings of previous work and is able to learn from statistics of substrings. To illustrate the approach we present experiments on a real application of the method to natural language parsing.

show abstract

Section: Related Workmentioning

confidence: 99%

Spectral learning of weighted automata

et al. 2013

Self Cite

View full text Add to dashboard Cite

show abstract

“…Recently a number of researchers have developed provably correct algorithms for parameter estimation in latent variable models such as hidden Markov models, topic models, directed graphical models with latent variables, and so on (Hsu et al, 2009;Bailly et al, 2010;Siddiqi et al, 2010;Parikh et al, 2011;Balle et al, 2011;Arora et al, 2013;Dhillon et al, 2012;Anandkumar et al, 2012;Arora et al, 2012;Arora et al, 2013). Many of these algorithms have their roots in spectral methods such as canonical correlation analysis (CCA) (Hotelling, 1936), or higher-order tensor decompositions.…”

Section: Related Workmentioning

confidence: 99%

A Provably Correct Learning Algorithm for Latent-Variable PCFGs

Cohen¹,

Collins²

2014

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

We introduce a provably correct learning algorithm for latent-variable PCFGs. The algorithm relies on two steps: first, the use of a matrix-decomposition algorithm applied to a co-occurrence matrix estimated from the parse trees in a training sample; second, the use of EM applied to a convex objective derived from the training samples in combination with the output from the matrix decomposition. Experiments on parsing and a language modeling problem show that the algorithm is efficient and effective in practice.

show abstract

“…It is also important to note that MLE is not the only option for estimating finite state probabilistic grammars. There has been some recent advances in learning finite state models (HMMs and finite state transducers) by using spectral analysis of matrices which consist of quantities estimated from observations only (Hsu, Kakade, and Zhang 2009;Balle, Quattoni, and Carreras 2011), based on the observable operator models of Jaeger (1999). These algorithms are not prone to local minima, and converge to the correct model as the number of samples increases, but require some assumptions about the underlying model that generates the data.…”

Section: Discussionmentioning

confidence: 99%

Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning

Cohen

Smith

2012

Computational Linguistics

View full text Add to dashboard Cite

Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distribution-dependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NP-hard. We therefore suggest an approximate algorithm, similar to expectation-maximization, to minimize the empirical risk.

show abstract

A Spectral Learning Algorithm for Finite State Transducers

Cited by 14 publications

References 22 publications

Spectral learning of weighted automata

Spectral learning of weighted automata

A Provably Correct Learning Algorithm for Latent-Variable PCFGs

Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning

Contact Info

Product

Resources

About