Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders

Drozdov, Andrew; Verga, Patrick; Yadav, Mohit; Iyyer, Mohit; McCallum, Andrew

doi:10.18653/v1/n19-1116

Cited by 102 publications

(167 citation statements)

References 34 publications

Supporting

Mentioning

163

Contrasting

Unclassified

Order By: Relevance

“…We observe that even on PTB, there is enough variation in setups across prior work on grammar induction to render a meaningful comparison difficult. Some important dimensions along which prior works vary include, (1) lexicalization: earlier work on grammar induction generally assumed gold (or induced) partof-speech tags (Klein and Manning, 2004;Smith and Eisner, 2004;Bod, 2006;Snyder et al, 2009), while more recent works induce grammar directly from words (Spitkovsky et al, 2013;Shen et al, 2018); (2) use of punctuation: even within papers that induce a grammar directly from words, some papers employ heuristics based on punctuation as punctuation is usually a strong signal for start/end of constituents (Seginer, 2007;Ponvert et al, 2011;Spitkovsky et al, 2013), some train with punctuation (Jin et al, 2018;Drozdov et al, 2019;Kim et al, 2019), while others discard punctuation altogether for training (Shen et al, 2018(Shen et al, , 2019; (3) train/test data: some works do not explicitly separate out train/test sets (Reichart and Rappoport, 2010;Golland et al, 2012) while some do (Huang et al, 2012;Parikh et al, 2014;Htut et al, 2018). Maintaining train/test splits is less of an issue for unsupervised structure learning, however in this work we follow the latter and separate train/test data.…”

Section: Baselines and Evaluationmentioning

confidence: 99%

Compound Probabilistic Context-Free Grammars for Grammar Induction

Kim

Dyer

Rushton

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

103

216

View full text Add to dashboard Cite

We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context-free grammar.In contrast to traditional formulations which learn a single stochastic grammar, our context-free rule probabilities are modulated by a per-sentence continuous latent variable, which induces marginal dependencies beyond the traditional context-free assumptions. Inference in this grammar is performed by collapsed variational inference, in which an amortized variational posterior is placed on the continuous variable, and the latent trees are marginalized with dynamic programming. Experiments on English and Chinese show the effectiveness of our approach compared to recent state-of-theart methods for grammar induction. probabilities on previous history, i.e. πz,T →w t ∝ exp(u w f2([wT ; z; ht])) where ht is the hidden state from an LSTM over x

show abstract

Section: Baselines and Evaluationmentioning

confidence: 99%

Compound Probabilistic Context-Free Grammars for Grammar Induction

Kim

Dyer

Rushton

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

103

216

View full text Add to dashboard Cite

show abstract

“…Yogatama et al (2017) propose to use reinforcement learning, and Maillard et al (2017) introduce the Tree-LSTM to jointly learn sentence embeddings and syntax trees, later combined with a Straight-Through Gumbel-Softmax estimator by Choi et al (2018). In addition to sentence classification tasks, recent research has focused on unsupervised structure learning for language modeling (Shen et al, 2018(Shen et al, , 2019Drozdov et al, 2019;Kim et al, 2019b). In our work, we explore the possibility for combining the merits of both sentence classification and language modeling.…”

Section: Related Workmentioning

confidence: 99%

“…As can be seen, our model is able to handle the period correctly in these examples. Although this could be specified by hand-written rules (Drozdov et al, 2019), it is in fact learned by our approach in an unsupervised manner, since punctuation marks are treated as tokens just like other words, and our training signal gives no clue regarding how punctuation marks should be processed. Moreover, our model is able to parse the verb phrases more accurately than the PRPN, including is a powerful and evocative museum and seemed a trifle embarrassed.…”

Section: Parse Tree Examplesmentioning

confidence: 99%

An Imitation Learning Approach to Unsupervised Parsing

Li¹,

Mou

Keller³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Recently, there has been an increasing interest in unsupervised parsers that optimize semantically oriented objectives, typically using reinforcement learning. Unfortunately, the learned trees often do not match actual syntax trees well. Shen et al. (2018) propose a structured attention mechanism for language modeling (PRPN), which induces better syntactic structures but relies on ad hoc heuristics. Also, their model lacks interpretability as it is not grounded in parsing actions. In our work, we propose an imitation learning approach to unsupervised parsing, where we transfer the syntactic knowledge induced by the PRPN to a Tree-LSTM model with discrete parsing actions. Its policy is then refined by Gumbel-Softmax training towards a semantically oriented objective. We evaluate our approach on the All Natural Language Inference dataset and show that it achieves a new state of the art in terms of parsing F -score, outperforming our base models, including the PRPN. 1

show abstract

“…We mainly compare our model to PRPN (Shen et al, 2018a), On-lstm (Shen et al, 2018b) and Compound PCFG(C-PCFG) (Kim et al, 2019a), in which the evaluation settings and the training data are identical to our model. DIORA (Drozdov et al, 2019) and URNNG (Kim et al, 2019b) use a relative larger training data and the evaluation settings are slightly different from our model. Our model performs much better than trivial trees (i.e.…”

Section: Grammar Inductionmentioning

confidence: 99%

Tree Transformer: Integrating Tree Structures into Self-Attention

Wang¹,

Lee²,

Chen³

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Pre-training Transformer from large-scale raw texts and fine-tuning on the desired task have achieved state-of-the-art results on diverse NLP tasks. However, it is unclear what the learned attention captures. The attention computed by attention heads seems not to match human intuitions about hierarchical structures. This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. The tree structures can be automatically induced from raw texts by our proposed "Constituent Attention" module, which is simply implemented by self-attention between two adjacent words. With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning more explainable attention scores 1 .

show abstract

Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders

Cited by 102 publications

References 34 publications

Compound Probabilistic Context-Free Grammars for Grammar Induction

Compound Probabilistic Context-Free Grammars for Grammar Induction

An Imitation Learning Approach to Unsupervised Parsing

Tree Transformer: Integrating Tree Structures into Self-Attention

Contact Info

Product

Resources

About