Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach

Du, Wei; Lin, Zhouhan; Shen, Yikang; O’Donnell, Timothy J.; Bengio, Yoshua; Zhang, Yue

doi:10.18653/v1/2020.acl-main.591

Cited by 12 publications

(9 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The goal is to examine the extent of the connection between linguistic dependency and statistical dependence according to models that have explicitly been designed to have linguistically-oriented inductive bias. Following Du et al (2020), we include two models: an ordered-neuron LSTM (ONLSTM; Shen et al, 2018), a model designed to have a hierarchical structural bias, and trained on raw text data as a language model, as well as an ONLSTM trained on the same data but with an additional auxiliary objective to reconstruct PTB syntax trees (ONLSTM-SYD).…”

Section: Methodsmentioning

confidence: 99%

Linguistic Dependencies and Statistical Dependence

Hoover,

Sordoni,

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

What is the relationship between linguistic dependencies and statistical dependence? Building on earlier work in NLP and cognitive science, we study this question. We introduce a contextualized version of pointwise mutual information (CPMI), using pretrained language models to estimate probabilities of words in context. Extracting dependency trees which maximize CPMI, we compare the resulting structures against gold dependencies. Overall, we find that these maximum-CPMI trees correspond to linguistic dependencies more often than trees extracted from non-contextual PMI estimate, but only roughly as often as a simple baseline formed by connecting adjacent words. We also provide evidence that the extent to which the two kinds of dependency align cannot be explained by the distance between words or by the category of the dependency relation. Finally, our analysis sheds some light on the differences between large pretrained language models, specifically in the kinds of inductive biases they encode.1 Modern linguistic theories have increasingly adopted a lexically-driven view of grammar. Under such a view, sentence structure is built using a small number structure-building operations (e.g., UNIFY, MERGE, etc.) whose behavior is controlled by specific words, morphemes, or other lexical item (

show abstract

Section: Methodsmentioning

confidence: 99%

Linguistic Dependencies and Statistical Dependence

Hoover,

Sordoni,

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Finally, our approach relates to the other works that propose ways of incorporating structural information into Transformer-based models. This includes the use of dependency or tree structure for constraining self-attention patterns (Strubell et al, 2018;Wang et al, 2019;, guiding cross-attention (Chen et al, 2018;Astudillo et al, 2020), modelling syntactic distance (Du et al, 2020), using syntactic information to guide the computation flow in the model (Shen et al, 2021), or through knowledge distillation (Kuncoro et al, 2020). Our structured masking in parsing as language modeling approach is close in spirit to the methods that modify attention mechanism according to syntactic connections (Astudillo et al, 2020); This work, however, primarily aims to study the impact of structural guidance on syntactic generalization.…”

Section: Related Workmentioning

confidence: 99%

Structural Guidance for Transformer Language Models

Qian

Naseem

Lévy

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Transformer-based language models pretrained on large amounts of text data have proven remarkably successful in learning generic transferable linguistic representations.Here we study whether structural guidance leads to more human-like systematic linguistic generalization in Transformer language models without resorting to pre-training on very large amounts of data. We explore two general ideas. The "Generative Parsing" idea jointly models the incremental parse and word sequence as part of the same sequence modeling task. The "Structural Scaffold" idea guides the language model's representation via additional structure loss that separately predicts the incremental constituency parse. We train the proposed models along with a vanilla Transformer language model baseline on a 14 million-token and a 46 million-token subset of the BLLIP dataset, and evaluate models' syntactic generalization performances on SG Test Suites and sized BLiMP. Experiment results across two benchmarks suggest converging evidence that generative structural supervisions can induce more robust and humanlike linguistic generalization in Transformer language models without the need for data intensive pre-training.

show abstract

“…To evaluate SOM's performance on incremental parsing, we trained and evaluated our models on the standard PTB constituency trees. Baseline models include: a) a standard incremental shift-reduce parser with one-step look-ahead; b) a incremental shift-reduce parser that equipped with our predic-tion network and trained on same dynamic oracle and language model loss as our model; c) a recently proposed ONLSTM-SYD model (Du et al, 2020) that is also trained on both language model and parsing loss; d) unsupervised ONLSTM; e) unsupervised PRPN. As shown in Table 5, SOMs outperform all baseline models, including the shiftreduce parser that has the same extra components as SOMs.…”

Section: Syntactic Generalizationmentioning

confidence: 99%

Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle

Shen¹,

Tan²,

Sordoni³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

Syntax is fundamental to our thinking about language. Failing to capture the structure of input language could lead to generalization problems and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with an incremental parser and maintains the conditional probability setting of a standard language model (left-to-right). To train the incremental parser and avoid exposure bias, we also propose a novel dynamic oracle, so that SOM is more robust to wrong parsing decisions. Experiments show that SOM can achieve strong results in language modeling, incremental parsing and syntactic generalization tests, while using fewer parameters than other models.

show abstract

Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach

Cited by 12 publications

References 50 publications

Linguistic Dependencies and Statistical Dependence

Linguistic Dependencies and Statistical Dependence

Structural Guidance for Transformer Language Models

Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle

Contact Info

Product

Resources

About