2019
DOI: 10.1609/aaai.v33i01.330186
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement

Abstract: With the promising progress of deep neural networks, layer aggregation has been used to fuse information across layers in various fields, such as computer vision and machine translation. However, most of the previous methods combine layers in a static fashion in that their aggregation strategy is independent of specific hidden states. Inspired by recent progress on capsule networks, in this paper we propose to use routing-by-agreement strategies to aggregate layers dynamically. Specifically, the algorithm lear… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
23
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 40 publications
(23 citation statements)
references
References 18 publications
0
23
0
Order By: Relevance
“…A number of recent efforts have explored ways to improve multi-head SAN by encouraging individual attention heads to extract distinct information (Strubell et al, 2018;. Concerning multi-layer SAN encoder, Dou et al (2018Dou et al ( , 2019 and propose to aggregate the multi-layer representations, and Dehghani et al (2019) recurrently refine these representations. Our approach is complementary to theirs, since they focus on improving the representation power of SAN encoder, while we aim to complement SAN encoder with an additional recurrence encoder.…”
Section: Short-cut Effectmentioning
confidence: 99%
“…A number of recent efforts have explored ways to improve multi-head SAN by encouraging individual attention heads to extract distinct information (Strubell et al, 2018;. Concerning multi-layer SAN encoder, Dou et al (2018Dou et al ( , 2019 and propose to aggregate the multi-layer representations, and Dehghani et al (2019) recurrently refine these representations. Our approach is complementary to theirs, since they focus on improving the representation power of SAN encoder, while we aim to complement SAN encoder with an additional recurrence encoder.…”
Section: Short-cut Effectmentioning
confidence: 99%
“…Exploiting deep representations have been studied to strengthen feature propagation and encourage feature reuse in NMT (Shen et al, 2018;Dou et al, 2018Dou et al, , 2019Wang et al, 2019b). All of these works mainly attend the decoder to the final output of the encoder stack, we instead coordinate the encoder and the decoder at earlier stage.…”
Section: Related Workmentioning
confidence: 99%
“…Recent studies show that different encoder layers capture linguistic properties of different levels (Peters et al, 2018), and aggregating layers is of profound value to better fuse semantic information (Shen et al, 2018;Dou et al, 2018;Dou et al, 2019). We assume that different decoder layers may value different levels of information i.e.…”
Section: Inputmentioning
confidence: 99%