Proceedings of the Third Conference on Machine Translation: Research Papers 2018
DOI: 10.18653/v1/w18-6326
|View full text |Cite
|
Sign up to set email alerts
|

Input Combination Strategies for Multi-Source Transformer Decoder

Abstract: In multi-source sequence-to-sequence tasks, the attention mechanism can be modeled in several ways. This topic has been thoroughly studied on recurrent architectures. In this paper, we extend the previous work to the encoder-decoder attention in the Transformer architecture. We propose four different input combination strategies for the encoderdecoder attention: serial, parallel, flat, and hierarchical. We evaluate our methods on tasks of multimodal translation and translation with multiple source languages. T… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
43
0
2

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 53 publications
(48 citation statements)
references
References 12 publications
0
43
0
2
Order By: Relevance
“…Follow-up studies extend the decoder-based visual attention approach in different ways: Calixto et al (2017) reimplement the gating mechanism (Xu et al 2015) to rescale the magnitude of the visual information before the fusion, while Libovický and Helcl (2017) introduce the hierarchical attention which replaces the concatenative fusion with a new attention layer that dynamically weighs the modality-specific context vectors. Finally, Arslan et al (2018) and Libovický et al (2018) introduce the same idea into the Transformer-based (Vaswani et al 2017) architectures. Besides revisiting the hierarchical attention, Libovický et al (2018) also introduce parallel and serial variants.…”
Section: Visual Attentionmentioning
confidence: 99%
“…Follow-up studies extend the decoder-based visual attention approach in different ways: Calixto et al (2017) reimplement the gating mechanism (Xu et al 2015) to rescale the magnitude of the visual information before the fusion, while Libovický and Helcl (2017) introduce the hierarchical attention which replaces the concatenative fusion with a new attention layer that dynamically weighs the modality-specific context vectors. Finally, Arslan et al (2018) and Libovický et al (2018) introduce the same idea into the Transformer-based (Vaswani et al 2017) architectures. Besides revisiting the hierarchical attention, Libovický et al (2018) also introduce parallel and serial variants.…”
Section: Visual Attentionmentioning
confidence: 99%
“…We also experimented with a hierarchical attention mechanism along the lines of Libovický and Helcl (2017) and Libovický et al (2018), but as this did not outperform the simpler combination mechanism in (5) in internal testing, our submitted systems utilized the latter.…”
Section: Multi-encoder Transformermentioning
confidence: 99%
“…3. Training a multi-encoder (Libovický and Helcl, 2017;Libovický et al, 2018) Transformer system (Vaswani et al, 2017) from…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In order to prevent that the errors in the Apertium translation are propagated to the output, the decoder should focus mostly on the SL input. However, according to the analysis of attention carried out by Libovickỳ et al (2018), in the serial multisource architecture of Marian the output seems to be built with information from all inputs. We plan to explore more multi-source architectures in the future.…”
Section: Hybridization With Rule-based Machine Translationmentioning
confidence: 99%