Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs

Yang, Song; Liu, Wei; Tu, Kewei

doi:10.18653/v1/2022.naacl-main.353

Cited by 2 publications

(9 citation statements)

References 25 publications

(54 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Baselines. Our HMM baselines include neural HMM (NHMM) (Chiu et al, 2021) , LHMM (Chiu et al, 2021), and Rank HMM (Yang et al, 2022).…”

Section: Methodsmentioning

confidence: 99%

“…Figure 1: Bayesian network-like representations of PCFG binary rules: (a) original grammar, (b) after tensor decomposition (Yang et al, 2021b), and (c) rank space grammar (Yang et al, 2022). Our simple PCFG is almost the same as (c) but uses a flexible parameterization.…”

Section: Wu Tmentioning

confidence: 99%

“…Comparing simple vs. low-rank PCFGs. The previous approach to scaling HMMs and PCFGs to thousands of nontermals is parameterizing the rule probability tensor T ∈ R |N |×|N |×|N | to be lowrank (Chiu et al, 2021;Yang et al, 2021bYang et al, , 2022. Low-rank PCFGs can be viewed as introducing a new latent variable, namely a "rank variable" R, to decompose π A→BC into R π A→R π B↶R π R↷C , as shown in Fig.…”

Section: Wu Tmentioning

confidence: 99%

“…Despite the improvements in unsupervised parsing obtained through scaling neural probabilistic context-free grammars (PCFGs), their language model performance scales less favorably compared to, for example, hidden Markov models (HMMs) and neural language models. On the Penn Treebank, a neural PCFG with 30 nonterminals and 60 preterminals obtains ≈ 250 perplexity (Kim et al, 2019), and while scaling neural PCFGs to thousands of states via a low-rank parameterization can improve perplexity to ≈ 170 (Yang et al, 2022), this still lags behind a similarly-sized HMM, which obtains ≈ 130 perplexity (Chiu et al, 2021), despite the fact that HMMs are a subclass of PCFGs This work proposes SimplePCFG, a simple PCFG formalism with independent left and right productions. We find that this simple PCFG scales more effectively (in terms of both language modeling and unsupervised parsing) than previous approaches which scale PCFGs by factorizing the rule probability tensor into low-rank components (Yang et al, 2021b(Yang et al, , 2022.…”

Section: Introductionmentioning

confidence: 99%

“…On the Penn Treebank, a neural PCFG with 30 nonterminals and 60 preterminals obtains ≈ 250 perplexity (Kim et al, 2019), and while scaling neural PCFGs to thousands of states via a low-rank parameterization can improve perplexity to ≈ 170 (Yang et al, 2022), this still lags behind a similarly-sized HMM, which obtains ≈ 130 perplexity (Chiu et al, 2021), despite the fact that HMMs are a subclass of PCFGs This work proposes SimplePCFG, a simple PCFG formalism with independent left and right productions. We find that this simple PCFG scales more effectively (in terms of both language modeling and unsupervised parsing) than previous approaches which scale PCFGs by factorizing the rule probability tensor into low-rank components (Yang et al, 2021b(Yang et al, , 2022. In particular, we find that simple PCFGs can obtain significantly lower perplexity in language modeling while achieving higher unsupervised parsing performance compared to lowrank PCFGs with a similar number of nonterminals, achieving a near state-of-the-art unsupervised parsing performance on the Penn Treebank with an F1 of 65.1.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Simple Hardware-Efficient PCFGs with Independent Left and Right Productions

Liu,

Yang,

Kim

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Scaling dense PCFGs to thousands of nonterminals via a low-rank parameterization of the rule probability tensor has been shown to be beneficial for unsupervised parsing. However, PCFGs scaled this way still perform poorly as a language model, and even underperform similarly-sized HMMs. This work introduces SimplePCFG, a simple PCFG formalism with independent left and right productions. Despite imposing a stronger independence assumption than the low-rank approach, we find that this formalism scales more effectively both as a language model and as an unsupervised parser. As an unsupervised parser, our simple PCFG obtains an average F1 of 65.1 on the English PTB, and as a language model, it obtains a perplexity of 119.0, outperforming similarly-sized lowrank PCFGs. We further introduce FlashInside, a hardware IO-aware implementation of the inside algorithm for efficiently scaling simple PCFGs.

show abstract

“…Baselines. Our HMM baselines include neural HMM (NHMM) (Chiu et al, 2021) , LHMM (Chiu et al, 2021), and Rank HMM (Yang et al, 2022).…”

Section: Methodsmentioning

confidence: 99%

Section: Wu Tmentioning

confidence: 99%

Section: Wu Tmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Simple Hardware-Efficient PCFGs with Independent Left and Right Productions

Liu,

Yang,

Kim

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

show abstract

SynJax: Structured Probability Distributions for JAX

Stanojević,

Sartran

2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

View full text Add to dashboard Cite

The development of deep learning software libraries enabled significant progress in the field by allowing users to focus on modeling, while letting the library to take care of the tedious and time-consuming task of optimizing execution for modern hardware accelerators. However, this has benefited only particular types of deep learning models, such as Transformers, whose primitives map easily to the vectorized computation. The models that explicitly account for structured objects, such as trees and segmentations, did not benefit equally because they require custom algorithms that are difficult to implement in a vectorized form.SynJax directly addresses this problem by providing an efficient vectorized implementation of inference algorithms for structured distributions covering alignment, tagging, segmentation, constituency trees and spanning trees. This is done by exploiting the connection between algorithms for automatic differentiation and probabilistic inference. With SynJax we can build large-scale differentiable models that explicitly model structure in the data. The code is available at https://github.com/google-deepmind/synjax.

show abstract

Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs

Cited by 2 publications

References 25 publications

Simple Hardware-Efficient PCFGs with Independent Left and Right Productions

Simple Hardware-Efficient PCFGs with Independent Left and Right Productions

SynJax: Structured Probability Distributions for JAX

Contact Info

Product

Resources

About