Cooperative Learning of Disjoint Syntax and Semantics

Havrylov, Serhii; Kruszewski, Germán; Joulin, Armand

doi:10.18653/v1/n19-1115

Cited by 41 publications

(47 citation statements)

References 28 publications

(34 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In other words, it is not capable of compositional learning. One possible route to alleviate this problem could include separating syntax and semantics as is customary on formal semantic methods (Partee et al, 1990) and, as recently suggested in the context of latent tree learning (Havrylov et al, 2019), so that syntax can guide semantics both in processing and learning.…”

Section: Discussionmentioning

confidence: 99%

The Fast and the Flexible: Training Neural Networks to Learn to Follow Instructions from Small Data

Leonandya¹,

Hupkes²,

Bruni³

et al. 2019

Proceedings of the 13th International Conference on Computational Semantics - Long Papers

Self Cite

View full text Add to dashboard Cite

Learning to follow human instructions is a long-pursued goal in artificial intelligence. The task becomes particularly challenging if no prior knowledge of the employed language is assumed while relying only on a handful of examples to learn from. Work in the past has relied on hand-coded components or manually engineered features to provide strong inductive biases that make learning in such situations possible. In contrast, here we seek to establish whether this knowledge can be acquired automatically by a neural network system through a two phase training procedure: A (slow) offline learning stage where the network learns about the general structure of the task and a (fast) online adaptation phase where the network learns the language of a new given speaker. Controlled experiments show that when the network is exposed to familiar instructions but containing novel words, the model adapts very efficiently to the new vocabulary. Moreover, even for human speakers whose language usage can depart significantly from our artificial training language, our network can still make use of its automatically acquired inductive bias to learn to follow instructions more effectively.

show abstract

Section: Discussionmentioning

confidence: 99%

The Fast and the Flexible: Training Neural Networks to Learn to Follow Instructions from Small Data

Leonandya¹,

Hupkes²,

Bruni³

et al. 2019

Proceedings of the 13th International Conference on Computational Semantics - Long Papers

Self Cite

View full text Add to dashboard Cite

show abstract

“…This will be even more of a problem if we would attempt to use it in the joint learning setup. Also note that similar parsing models do not yield linguistically-plausible structures when used in the conventional (i.e., non-grounded) grammarinduction set-ups (Williams et al, 2018;Havrylov et al, 2019).…”

Section: Limitations Of the Vg-nsl Frameworkmentioning

confidence: 99%

Visually Grounded Compound PCFGs

Zhao

Titov

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Exploiting visual groundings for language understanding has recently been drawing much attention. In this work, we study visually grounded grammar induction and learn a constituency parser from both unlabeled text and its visual groundings. Existing work on this task (Shi et al., 2019) optimizes a parser via REINFORCE and derives the learning signal only from the alignment of images and sentences. While their model is relatively accurate overall, its error distribution is very uneven, with low performance on certain constituents types (e.g., 26.2% recall on verb phrases, VPs) and high on others (e.g., 79.6% recall on noun phrases, NPs). This is not surprising as the learning signal is likely insufficient for deriving all aspects of phrasestructure syntax and gradient estimates are noisy. We show that using an extension of probabilistic context-free grammar model we can do fully-differentiable end-to-end visually grounded learning. Additionally, this enables us to complement the image-text alignment loss with a language modeling objective. On the MSCOCO test captions, our model establishes a new state of the art, outperforming its non-grounded version and, thus, confirming the effectiveness of visual groundings in constituency grammar induction. It also substantially outperforms the previous grounded model, with largest improvements on more 'abstract' categories (e.g., +55.1% recall on VPs). 1

show abstract

“…An established method is the score function estimator (SFE) (Glynn, 1990;Williams, 1992;Kleijnen and Rubinstein, 1996). SFE is widely used in NLP, for tasks including minimum risk training in NMT (Shen et al, 2016;Wu et al, 2018) and latent linguistic structure learning Havrylov et al, 2019). In this paper, we focus on the alternative strategy of surrogate gradients, which allows learning in deterministic graphs with discrete, argmax-like nodes, rather than in stochastic graphs.…”

Section: Related Workmentioning

confidence: 99%

Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

Mihaylova¹,

Niculae²,

Martins³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream learning objective. In this paradigm, we discover a principled motivation for both the straight-through estimator (STE) as well as the recently-proposed SPIGOT-a variant of STE for structured models. Our perspective leads to new algorithms in the same family. We empirically compare the known and the novel pulled-back estimators against the popular alternatives, yielding new insight for practitioners and revealing intriguing failure cases.

show abstract

Cooperative Learning of Disjoint Syntax and Semantics

Cited by 41 publications

References 28 publications

The Fast and the Flexible: Training Neural Networks to Learn to Follow Instructions from Small Data

The Fast and the Flexible: Training Neural Networks to Learn to Follow Instructions from Small Data

Visually Grounded Compound PCFGs

Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

Contact Info

Product

Resources

About