2018
DOI: 10.48550/arxiv.1808.10681
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…We call our deep learning model life2vec. The life2vec model is based on a transformerarchitecture [31,30,44,45,46,47,48,49,50,51]. Transformers are well suited for representing life-sequences due to their ability to compress contextual information [52,53] and take into account temporal and positional information [5,54].…”
Section: The Life2vec Modelmentioning
confidence: 99%
See 3 more Smart Citations
“…We call our deep learning model life2vec. The life2vec model is based on a transformerarchitecture [31,30,44,45,46,47,48,49,50,51]. Transformers are well suited for representing life-sequences due to their ability to compress contextual information [52,53] and take into account temporal and positional information [5,54].…”
Section: The Life2vec Modelmentioning
confidence: 99%
“…Each contextual representation, x i , is transformed via f 1 (x) = tanh(x W 1 + b 1 ), followed by l2-normalisation, norm(x) = x/∥x∥. The weights of the final layer, f 2 , are tied to the embedding matrix, E V , which is further normalized to preserve only directions [48]. The resulting scores is scaled by α to sharpen the distribution [46] MLM…”
Section: Pre-training: Learning Structure Of the Datamentioning
confidence: 99%
See 2 more Smart Citations