Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.591
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach

Abstract: It is commonly believed that knowledge of syntactic structure should improve language modeling. However, effectively and computationally efficiently incorporating syntactic structure into neural language models has been a challenging topic. In this paper, we make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances", where information between these two separate objectives shares the same intermediate representatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 50 publications
0
9
0
Order By: Relevance
“…The goal is to examine the extent of the connection between linguistic dependency and statistical dependence according to models that have explicitly been designed to have linguistically-oriented inductive bias. Following Du et al (2020), we include two models: an ordered-neuron LSTM (ONLSTM; Shen et al, 2018), a model designed to have a hierarchical structural bias, and trained on raw text data as a language model, as well as an ONLSTM trained on the same data but with an additional auxiliary objective to reconstruct PTB syntax trees (ONLSTM-SYD).…”
Section: Methodsmentioning
confidence: 99%
“…The goal is to examine the extent of the connection between linguistic dependency and statistical dependence according to models that have explicitly been designed to have linguistically-oriented inductive bias. Following Du et al (2020), we include two models: an ordered-neuron LSTM (ONLSTM; Shen et al, 2018), a model designed to have a hierarchical structural bias, and trained on raw text data as a language model, as well as an ONLSTM trained on the same data but with an additional auxiliary objective to reconstruct PTB syntax trees (ONLSTM-SYD).…”
Section: Methodsmentioning
confidence: 99%
“…Finally, our approach relates to the other works that propose ways of incorporating structural information into Transformer-based models. This includes the use of dependency or tree structure for constraining self-attention patterns (Strubell et al, 2018;Wang et al, 2019;, guiding cross-attention (Chen et al, 2018;Astudillo et al, 2020), modelling syntactic distance (Du et al, 2020), using syntactic information to guide the computation flow in the model (Shen et al, 2021), or through knowledge distillation (Kuncoro et al, 2020). Our structured masking in parsing as language modeling approach is close in spirit to the methods that modify attention mechanism according to syntactic connections (Astudillo et al, 2020); This work, however, primarily aims to study the impact of structural guidance on syntactic generalization.…”
Section: Related Workmentioning
confidence: 99%
“…To evaluate SOM's performance on incremental parsing, we trained and evaluated our models on the standard PTB constituency trees. Baseline models include: a) a standard incremental shift-reduce parser with one-step look-ahead; b) a incremental shift-reduce parser that equipped with our predic-tion network and trained on same dynamic oracle and language model loss as our model; c) a recently proposed ONLSTM-SYD model (Du et al, 2020) that is also trained on both language model and parsing loss; d) unsupervised ONLSTM; e) unsupervised PRPN. As shown in Table 5, SOMs outperform all baseline models, including the shiftreduce parser that has the same extra components as SOMs.…”
Section: Syntactic Generalizationmentioning
confidence: 99%