Scalable Syntax-Aware Language Models Using Knowledge Distillation

Kuncoro, Adhiguna; Dyer, Chris; Rimell, Laura; Clark, Stephen; Blunsom, Phil

doi:10.18653/v1/p19-1337

Cited by 28 publications

(43 citation statements)

References 42 publications

(55 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another relevant work on the capacity of LSTM-LMs is Kuncoro et al (2019), which shows that by distilling from syntactic LMs (Dyer et al, 2016), LSTM-LMs can improve their robustness on various agreement phenomena. We show that our LMs with the margin loss outperform theirs in most of the aspects, further strengthening the argument about a stronger capacity of LSTM-LMs.…”

Section: Past Work Conceptually Similar To Us Ismentioning

confidence: 99%

“…Training data Following the practice, we train LMs on the dataset not directly relevant to the test set. Throughout the paper, we use an English Wikipedia corpus assembled by Gulordava et al (2018), which has been used as training data for the present task (Marvin and Linzen, 2018;Kuncoro et al, 2019), consisting of 80M/10M/10M tokens for training/dev/test sets. It is tokenized and rare words are replaced by a single unknown token, amounting to the vocabulary size of 50,000.…”

Section: Language Modelsmentioning

confidence: 99%

“…Deviating from some prior work (Marvin and Linzen, 2018;van Schijndel et al, 2019), we train LMs at sentence level as in sequence-tosequence models (Sutskever et al, 2014). This setting has been employed in some previous work (Kuncoro et al, 2018(Kuncoro et al, , 2019. 2 Parameters are optimized by SGD.…”

Section: Language Modelsmentioning

confidence: 99%

See 2 more Smart Citations

An Analysis of the Utility of Explicit Negative Examples to Improve the Syntactic Abilities of Neural Language Models

Noji

Takamura

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

We explore the utilities of explicit negative examples in training neural language models. Negative examples here are incorrect words in a sentence, such as barks in *The dogs barks. Neural language models are commonly trained only on positive examples, a set of sentences in the training data, but recent studies suggest that the models trained in this way are not capable of robustly handling complex syntactic constructions, such as long-distance agreement. In this paper, we first demonstrate that appropriately using negative examples about particular constructions (e.g., subject-verb agreement) will boost the model's robustness on them in English, with a negligible loss of perplexity. The key to our success is an additional margin loss between the log-likelihoods of a correct word and an incorrect word. We then provide a detailed analysis of the trained models. One of our findings is the difficulty of object-relative clauses for RNNs. We find that even with our direct learning signals the models still suffer from resolving agreement across an object-relative clause. Augmentation of training sentences involving the constructions somewhat helps, but the accuracy still does not reach the level of subjectrelative clauses. Although not directly cognitively appealing, our method can be a tool to analyze the true architectural limitation of neural models on challenging linguistic constructions.

show abstract

Section: Past Work Conceptually Similar To Us Ismentioning

confidence: 99%

Section: Language Modelsmentioning

confidence: 99%

See 1 more Smart Citation

An Analysis of the Utility of Explicit Negative Examples to Improve the Syntactic Abilities of Neural Language Models

Noji

Takamura

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Our work is also closely related to Kuncoro et al, (2019), who distill syntactic structure knowledge to a student LSTM model. The difference lies in that they focus on transferring tree knowledge from syntax-aware language model for achieving scalable unsupervised syntax induction, while we aim at integrating heterogeneous syntax for improving downstream tasks.…”

Section: Knowledge Distillationmentioning

confidence: 98%

“…Sequential models have been proven effective on encoding syntactic tree information (Shen et al, 2018;Kuncoro et al, 2019). We set the goal of KD as simultaneously distilling heterogeneous structures from tree encoder teachers into a LSTM student model.…”

Section: Heterogeneous Structure Distillationmentioning

confidence: 99%

Mimic and Conquer: Heterogeneous Tree Structure Distillation for Syntactic NLP

Fei

Ren

2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Syntax has been shown useful for various NLP tasks, while existing work mostly encodes singleton syntactic tree using one hierarchical neural network. In this paper, we investigate a simple and effective method, Knowledge Distillation, to integrate heterogeneous structure knowledge into a unified sequential LSTM encoder. Experimental results on four typical syntax-dependent tasks show that our method outperforms tree encoders by effectively integrating rich heterogeneous structure syntax, meanwhile reducing error propagation, and also outperforms ensemble methods, in terms of both the efficiency and accuracy.

show abstract

A divide and conquer framework for Knowledge Editing

Han

Li³

et al. 2023

Knowledge-Based Systems

View full text Add to dashboard Cite

Scalable Syntax-Aware Language Models Using Knowledge Distillation

Cited by 28 publications

References 42 publications

An Analysis of the Utility of Explicit Negative Examples to Improve the Syntactic Abilities of Neural Language Models

An Analysis of the Utility of Explicit Negative Examples to Improve the Syntactic Abilities of Neural Language Models

Mimic and Conquer: Heterogeneous Tree Structure Distillation for Syntactic NLP

A divide and conquer framework for Knowledge Editing

Contact Info

Product

Resources

About