2021
DOI: 10.1162/tacl_a_00358
|View full text |Cite
|
Sign up to set email alerts
|

Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement

Abstract: We propose the Recursive Non-autoregressive Graph-to-Graph Transformer architecture (RNGTr) for the iterative refinement of arbitrary graphs through the recursive application of a non-autoregressive Graph-to-Graph Transformer and apply it to syntactic dependency parsing. We demonstrate the power and effectiveness of RNGTr on several dependency corpora, using a refinement model pre-trained with BERT. We also introduce Syntactic Transformer (SynTr), a non-recursive parser similar to our refinement model. RNGTr c… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2

Relationship

3
5

Authors

Journals

citations
Cited by 17 publications
(17 citation statements)
references
References 46 publications
(75 reference statements)
0
11
0
Order By: Relevance
“…In Table 4, we analyse the interaction of the dependency graph with key and query vectors in the attention mechanism, as defined in Equation 6. Excluding the key interaction results in a similar attention score mechanism as defined in Mohammadshahi and Henderson (2020b). This SynG2G-Tr-key model achieves similar results compared to the SynG2G-Tr model on the WSJ test dataset, but the SynG2G-Tr model outperforms it on the development set, and both types of out-ofdomain datasets, confirming that key interaction is a critical part of the SynG2G-Tr model.…”
Section: Ablation Studymentioning
confidence: 56%
See 2 more Smart Citations
“…In Table 4, we analyse the interaction of the dependency graph with key and query vectors in the attention mechanism, as defined in Equation 6. Excluding the key interaction results in a similar attention score mechanism as defined in Mohammadshahi and Henderson (2020b). This SynG2G-Tr-key model achieves similar results compared to the SynG2G-Tr model on the WSJ test dataset, but the SynG2G-Tr model outperforms it on the development set, and both types of out-ofdomain datasets, confirming that key interaction is a critical part of the SynG2G-Tr model.…”
Section: Ablation Studymentioning
confidence: 56%
“…Recently Mohammadshahi and Henderson (2020a) proposed an architecture called Graphto-Graph Transformer which allows the input and output of arbitrary graphs. They first applied it to transition-based dependency parsing, for conditioning on the partially constructed dependency graph (Mohammadshahi and Henderson, 2020a), and then to graph-based syntactic parsing with iterative refinement (Mohammadshahi and Henderson, 2020b), where predicted dependency graphs are iteratively corrected. The Graph-to-Graph Transformer architecture inputs graph relations as embeddings incorporated into the self-attention mechanism of Transformer (Vaswani et al, 2017b), inspired by the way Shaw et al (2018) encode sequence order with relative position embeddings.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We also point out that, if FOP and SOP could find equivalently good models on dev, SOP models seem to better generalize. For parsers with BERT, with a simple averaging of BSOP, we achieve comparable performances (or even better in case of LAS) when comparing to more involved methods such as Mohammadshahi and Henderson, 2021). It remains to be seen whether they can also benefit from MoEs.…”
Section: Results On Testmentioning
confidence: 85%
“…Another interesting direction that's worth exploring is to use the continuous tree distances predicted by our methods as features for downstream tasks instead of the discrete tree structures produced by conventional parsers. As recent work has been exploring, this differentiable representation of tree structure is potentially useful within the iterative-refinement framework (Mohammadshahi and Henderson, 2020), or as additional tree-specific positional features in a transformer (Omote et al, 2019).…”
Section: Discussionmentioning
confidence: 99%