Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement

Mohammadshahi, Alireza; Henderson, James

doi:10.1162/tacl_a_00358

Cited by 17 publications

(17 citation statements)

References 46 publications

(75 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In Table 4, we analyse the interaction of the dependency graph with key and query vectors in the attention mechanism, as defined in Equation 6. Excluding the key interaction results in a similar attention score mechanism as defined in Mohammadshahi and Henderson (2020b). This SynG2G-Tr-key model achieves similar results compared to the SynG2G-Tr model on the WSJ test dataset, but the SynG2G-Tr model outperforms it on the development set, and both types of out-ofdomain datasets, confirming that key interaction is a critical part of the SynG2G-Tr model.…”

Section: Ablation Studymentioning

confidence: 56%

“…Recently Mohammadshahi and Henderson (2020a) proposed an architecture called Graphto-Graph Transformer which allows the input and output of arbitrary graphs. They first applied it to transition-based dependency parsing, for conditioning on the partially constructed dependency graph (Mohammadshahi and Henderson, 2020a), and then to graph-based syntactic parsing with iterative refinement (Mohammadshahi and Henderson, 2020b), where predicted dependency graphs are iteratively corrected. The Graph-to-Graph Transformer architecture inputs graph relations as embeddings incorporated into the self-attention mechanism of Transformer (Vaswani et al, 2017b), inspired by the way Shaw et al (2018) encode sequence order with relative position embeddings.…”

Section: Introductionmentioning

confidence: 99%

“…The Syntax-aware Graph-to-Graph Transformer architecture uses an improved way of inputting graph relations into the self-attention mechanism of Transformer (Vaswani et al, 2017b). Unlike the previously proposed version of Graph-to-Graph Transformer (Mohammadshahi and Henderson, 2020b), we modify the self-attention mechanism to have a more comprehensive interaction between graph relations, queries and keys.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Syntax-Aware Graph-to-Graph Transformer for Semantic Role Labelling

Mohammadshahi¹,

Henderson²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

The goal of semantic role labelling (SRL) is to recognise the predicate-argument structure of a sentence. Recent models have shown that syntactic information can enhance the SRL performance, but other syntax-agnostic approaches achieved reasonable performance. The best way to encode syntactic information for the SRL task is still an open question. In this paper, we propose the Syntax-aware Graph-to-Graph Transformer (SynG2G-Tr) architecture, which encodes the syntactic structure with a novel way to input graph relations as embeddings directly into the self-attention mechanism of Transformer. This approach adds a soft bias towards attention patterns that follow the syntactic structure but also allows the model to use this information to learn alternative patterns. We evaluate our model on both dependencybased and span-based SRL datasets, and outperform all previous syntax-aware and syntax-agnostic models in both in-domain and out-of-domain settings, on the CoNLL 2005 and CoNLL 2009 datasets. Our architecture is general and can be applied to encode any graph information for a desired downstream task.

show abstract

Section: Ablation Studymentioning

confidence: 56%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Syntax-Aware Graph-to-Graph Transformer for Semantic Role Labelling

Mohammadshahi¹,

Henderson²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…We also point out that, if FOP and SOP could find equivalently good models on dev, SOP models seem to better generalize. For parsers with BERT, with a simple averaging of BSOP, we achieve comparable performances (or even better in case of LAS) when comparing to more involved methods such as Mohammadshahi and Henderson, 2021). It remains to be seen whether they can also benefit from MoEs.…”

Section: Results On Testmentioning

confidence: 85%

Strength in Numbers: Averaging and Clustering Effects in Mixture of Experts for Graph-Based Dependency Parsing

Zhang¹,

Roux²,

Charnois³

2021

Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing Into Enhanced

View full text Add to dashboard Cite

We review two features of mixture of experts (MoE) models which we call averaging and clustering effects in the context of graph-based dependency parsers learned in a supervised probabilistic framework. Averaging corresponds to the ensemble combination of parsers and is responsible for variance reduction which helps stabilizing and improving parsing accuracy. Clustering describes the capacity of MoE models to give more credit to experts believed to be more accurate given an input. Although promising, this is difficult to achieve, especially without additional data.We design an experimental set-up to study the impact of these effects.Whereas averaging is always beneficial, clustering requires good initialization and stabilization techniques, but its advantages over mere averaging seem to eventually vanish when enough experts are present.As a by product, we show how this leads to state-of-the-art results on the PTB and the CoNLL09 Chinese treebank, with low variance across experiments.

show abstract

“…Another interesting direction that's worth exploring is to use the continuous tree distances predicted by our methods as features for downstream tasks instead of the discrete tree structures produced by conventional parsers. As recent work has been exploring, this differentiable representation of tree structure is potentially useful within the iterative-refinement framework (Mohammadshahi and Henderson, 2020), or as additional tree-specific positional features in a transformer (Omote et al, 2019).…”

Section: Discussionmentioning

confidence: 99%

Dependency parsing with structure preserving embeddings

Kádár

Xiao²,

Kemertas

et al. 2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

Modern neural approaches to dependency parsing are trained to predict a tree structure by jointly learning a contextual representation for tokens in a sentence, as well as a headdependent scoring function. Whereas this strategy results in high performance, it is difficult to interpret these representations in relation to the geometry of the underlying tree structure. Our work seeks instead to learn interpretable representations by training a parser to explicitly preserve structural properties of a tree. We do so by casting dependency parsing as a tree embedding problem where we incorporate geometric properties of dependency trees in the form of training losses within a graph-based parser. We provide a thorough evaluation of these geometric losses, showing that the majority of them yield strong tree distance preservation as well as parsing performance on par with a competitive graph-based parser . Finally, we show where parsing errors lie in terms of tree relationship in order to guide future work.

show abstract

Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement

Cited by 17 publications

References 46 publications

Syntax-Aware Graph-to-Graph Transformer for Semantic Role Labelling

Syntax-Aware Graph-to-Graph Transformer for Semantic Role Labelling

Strength in Numbers: Averaging and Clustering Effects in Mixture of Experts for Graph-Based Dependency Parsing

Dependency parsing with structure preserving embeddings

Contact Info

Product

Resources

About