Consistent Unsupervised Estimators for Anchored PCFGs

Clark, Alexander; Fijalkow, Nathanaël

doi:10.1162/tacl_a_00323

Cited by 8 publications

(6 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They show that syntactic dependencies as annotated in dependency treebanks identify word pairs with especially high mutual information, and give a derivation showing that this is to be expected according to a formalization of the postulates of dependency grammar. The connection between mutual information and syntactic dependency has also been explored in the literature on grammar induction and unsupervised chunking (Clark & Fijalkow, 2020; de Paiva Alves, 1996; Harris, 1955; McCauley & Christiansen, 2019; Yuret, 1998).…”

Section: Discussionmentioning

confidence: 99%

Modeling word and morpheme order in natural language as an efficient trade-off of memory and surprisal.

Hahn¹,

Degen²,

Futrell³

2021

Psychological Review

View full text Add to dashboard Cite

Memory limitations are known to constrain language comprehension and production, and have been argued to account for crosslinguistic word order regularities. However, a systematic assessment of the role of memory limitations in language structure has proven elusive, in part because it is hard to extract precise large-scale quantitative generalizations about language from existing mechanistic models of memory use in sentence processing. We provide an architecture-independent information-theoretic formalization of memory limitations which enables a simple calculation of the memory efficiency of languages. Our notion of memory efficiency is based on the idea of a memory-surprisal tradeoff : a certain level of average surprisal per word can only be achieved at the cost of storing some amount of information about past context. Based on this notion of memory usage, we advance the Efficient Tradeoff Hypothesis: the order of elements in natural language is under pressure to enable favorable memorysurprisal tradeoffs. We derive that languages enable more efficient tradeoffs when they exhibit information locality: when predictive information about an element is concentrated in its recent past. We provide empirical evidence from three test domains in support of the Efficient Tradeoff Hypothesis: a reanalysis of a miniature artificial language learning experiment, a large-scale study of word order in corpora of 54 languages, and an analysis of morpheme order in two agglutinative languages. These results suggest that principles of order in natural language can be explained via highly generic cognitively motivated principles and lend support to efficiency-based models of the structure of human language.

show abstract

Section: Discussionmentioning

confidence: 99%

Modeling word and morpheme order in natural language as an efficient trade-off of memory and surprisal.

Hahn¹,

Degen²,

Futrell³

2021

Psychological Review

View full text Add to dashboard Cite

show abstract

“…Because of the known hardness results some restrictions need to be applied. Recent works include an L * learning algorithm for MDPs [TAB + 21] (here the assumption is that states of the MDPs generate an observable output that allows identifying the current state based on the generated input-output sequence), a passive learning algorithm for a subclass of PCFGs obtained by imposing several structural restrictions [CF20,Cla21], and using PDFA learning to obtain an interpretable model of practically black-box models such as recurrent neural networks [WGY19].…”

Section: Discussionmentioning

confidence: 99%

Learning of Structurally Unambiguous Probabilistic Grammars

Nitay¹,

Fisman²,

Ziv-Ukelson³

2023

Logical Methods in Computer Science

View full text Add to dashboard Cite

The problem of identifying a probabilistic context free grammar has two aspects: the first is determining the grammar's topology (the rules of the grammar) and the second is estimating probabilistic weights for each rule. Given the hardness results for learning context-free grammars in general, and probabilistic grammars in particular, most of the literature has concentrated on the second problem. In this work we address the first problem. We restrict attention to structurally unambiguous weighted context-free grammars (SUWCFG) and provide a query learning algorithm for \structurally unambiguous probabilistic context-free grammars (SUPCFG). We show that SUWCFG can be represented using \emph{co-linear multiplicity tree automata} (CMTA), and provide a polynomial learning algorithm that learns CMTAs. We show that the learned CMTA can be converted into a probabilistic grammar, thus providing a complete algorithm for learning a structurally unambiguous probabilistic context free grammar (both the grammar topology and the probabilistic weights) using structured membership queries and structured equivalence queries. A summarized version of this work was published at AAAI 21.

show abstract

“…To generate random PCFGs, we follow the method used by Clark and Fijalkow (2021). 4 Their method involves first generating a synthetic context-free grammar (CFG) with a specified number of terminals, non-terminals, binary rules, and lexical rules.…”

Section: Random Pcfg Experimentsmentioning

confidence: 99%

“…The approach of Clark and Fijalkow (2021) allows us to construct random grammars with a desired number of nonterminals and rules, and we generate grammars having 20 nonterminals, 5 and 100, 400, and 800 rules, respectively. The number of terminals and lexical rules is set to 5000.…”

Section: Datamentioning

confidence: 99%

Approximating CKY with Transformers

Khalighinejad,

Liu,

Wiseman

2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

We investigate the ability of transformer models to approximate the CKY algorithm, using them to directly predict a sentence's parse and thus avoid the CKY algorithm's cubic dependence on sentence length. We find that on standard constituency parsing benchmarks this approach achieves competitive or better performance than comparable parsers that make use of CKY, while being faster. We also evaluate the viability of this approach for parsing under random PCFGs. Here we find that performance declines as the grammar becomes more ambiguous, suggesting that the transformer is not fully capturing the CKY computation. However, we also find that incorporating additional inductive bias is helpful, and we propose a novel approach that makes use of gradients with respect to chart representations in predicting the parse, in analogy with the CKY algorithm being a subgradient of a partition function variant with respect to the chart.

show abstract

Consistent Unsupervised Estimators for Anchored PCFGs

Cited by 8 publications

References 24 publications

Modeling word and morpheme order in natural language as an efficient trade-off of memory and surprisal.

Modeling word and morpheme order in natural language as an efficient trade-off of memory and surprisal.

Learning of Structurally Unambiguous Probabilistic Grammars

Approximating CKY with Transformers

Contact Info

Product

Resources

About