“…Despite the improvements in unsupervised parsing obtained through scaling neural probabilistic context-free grammars (PCFGs), their language model performance scales less favorably compared to, for example, hidden Markov models (HMMs) and neural language models. On the Penn Treebank, a neural PCFG with 30 nonterminals and 60 preterminals obtains ≈ 250 perplexity (Kim et al, 2019), and while scaling neural PCFGs to thousands of states via a low-rank parameterization can improve perplexity to ≈ 170 (Yang et al, 2022), this still lags behind a similarly-sized HMM, which obtains ≈ 130 perplexity (Chiu et al, 2021), despite the fact that HMMs are a subclass of PCFGs This work proposes SimplePCFG, a simple PCFG formalism with independent left and right productions. We find that this simple PCFG scales more effectively (in terms of both language modeling and unsupervised parsing) than previous approaches which scale PCFGs by factorizing the rule probability tensor into low-rank components (Yang et al, 2021b(Yang et al, , 2022.…”