2020
DOI: 10.1162/tacl_a_00323
|View full text |Cite
|
Sign up to set email alerts
|

Consistent Unsupervised Estimators for Anchored PCFGs

Abstract: Learning probabilistic context-free grammars (PCFGs) from strings is a classic problem in computational linguistics since Horning ( 1969 ). Here we present an algorithm based on distributional learning that is a consistent estimator for a large class of PCFGs that satisfy certain natural conditions including being anchored (Stratos et al., 2016 ). We proceed via a reparameterization of (top–down) PCFGs that we call a bottom–up weighted context-free grammar. We show that if the grammar is anchored and satisfies… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 24 publications
0
3
0
Order By: Relevance
“…They show that syntactic dependencies as annotated in dependency treebanks identify word pairs with especially high mutual information, and give a derivation showing that this is to be expected according to a formalization of the postulates of dependency grammar. The connection between mutual information and syntactic dependency has also been explored in the literature on grammar induction and unsupervised chunking (Clark & Fijalkow, 2020; de Paiva Alves, 1996; Harris, 1955; McCauley & Christiansen, 2019; Yuret, 1998).…”
Section: Discussionmentioning
confidence: 99%
“…They show that syntactic dependencies as annotated in dependency treebanks identify word pairs with especially high mutual information, and give a derivation showing that this is to be expected according to a formalization of the postulates of dependency grammar. The connection between mutual information and syntactic dependency has also been explored in the literature on grammar induction and unsupervised chunking (Clark & Fijalkow, 2020; de Paiva Alves, 1996; Harris, 1955; McCauley & Christiansen, 2019; Yuret, 1998).…”
Section: Discussionmentioning
confidence: 99%
“…Because of the known hardness results some restrictions need to be applied. Recent works include an L * learning algorithm for MDPs [TAB + 21] (here the assumption is that states of the MDPs generate an observable output that allows identifying the current state based on the generated input-output sequence), a passive learning algorithm for a subclass of PCFGs obtained by imposing several structural restrictions [CF20,Cla21], and using PDFA learning to obtain an interpretable model of practically black-box models such as recurrent neural networks [WGY19].…”
Section: Discussionmentioning
confidence: 99%
“…To generate random PCFGs, we follow the method used by Clark and Fijalkow (2021). 4 Their method involves first generating a synthetic context-free grammar (CFG) with a specified number of terminals, non-terminals, binary rules, and lexical rules.…”
Section: Random Pcfg Experimentsmentioning
confidence: 99%
“…The approach of Clark and Fijalkow (2021) allows us to construct random grammars with a desired number of nonterminals and rules, and we generate grammars having 20 nonterminals, 5 and 100, 400, and 800 rules, respectively. The number of terminals and lexical rules is set to 5000.…”
Section: Datamentioning
confidence: 99%