2019
DOI: 10.1609/aaai.v33i01.33015466
|View full text |Cite
|
Sign up to set email alerts
|

Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder

Abstract: Sharing source and target side vocabularies and word embeddings has been a popular practice in neural machine translation (briefly, NMT) for similar languages (e.g., English to French or German translation). The success of such wordlevel sharing motivates us to move one step further: we consider model-level sharing and tie the whole parts of the encoder and decoder of an NMT model. We share the encoder and decoder of Transformer (Vaswani et al. 2017), the state-of-the-art NMT model, and obtain a compact model … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 50 publications
(19 citation statements)
references
References 17 publications
0
19
0
Order By: Relevance
“…Limits of sharing parameters: Concurrent studies have shown that sharing the self-attention and feed-forward layer parameters between the encoder and decoder is possible without a great loss in performance (Xia et al, 2019). However, its combination with RS perform badly.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Limits of sharing parameters: Concurrent studies have shown that sharing the self-attention and feed-forward layer parameters between the encoder and decoder is possible without a great loss in performance (Xia et al, 2019). However, its combination with RS perform badly.…”
Section: Discussionmentioning
confidence: 99%
“…Eventually, the RS models have the same size as that of a 1-layer model. Another approach is to share the parameters between the encoder and the decoder (Xia et al, 2019;Dabre and Fujita, 2019). We consider that this approach is orthogonal to our RS and will examine their combination in our future work.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…There were two tasks conducted to evaluate the proposed dualformer for machine translation in terms of BLEU [25,26,27,28,29]. The standard transformer and other dual learning methods were implemented for comparison.…”
Section: Methodsmentioning
confidence: 99%
“…Our method ties the parameters of multiple models, which is orthogonal to the work that ties parameters between layers (Dabre and Fujita, 2019) and/or between the encoder and decoder within a single model (Xia et al, 2019;Dabre and Fujita, 2019). Parameter tying leads to compact models, but they usually suffer from drops in inference quality.…”
Section: Related Workmentioning
confidence: 99%