2018
DOI: 10.1016/j.neucom.2018.01.007
|View full text |Cite
|
Sign up to set email alerts
|

Fine-grained attention mechanism for neural machine translation

Abstract: Neural machine translation (NMT) has been a new paradigm in machine translation, and the attention mechanism has become the dominant approach with the state-of-the-art records in many language pairs. While there are variants of the attention mechanism, all of them use only temporal attention where one scalar value is assigned to one context vector corresponding to a source word. In this paper, we propose a fine-grained (or 2D) attention mechanism where each dimension of a context vector will receive a separate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
60
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 167 publications
(60 citation statements)
references
References 16 publications
0
60
0
Order By: Relevance
“…where l is the index number of operation results and ranges from 1 to L. Then, the neural attention mechanism 43 is utilized as a part of decoder to generate prediction results. We rstly compute two weight factors in neural attention mechanism as the following formulas:…”
Section: Decodingmentioning
confidence: 99%
“…where l is the index number of operation results and ranges from 1 to L. Then, the neural attention mechanism 43 is utilized as a part of decoder to generate prediction results. We rstly compute two weight factors in neural attention mechanism as the following formulas:…”
Section: Decodingmentioning
confidence: 99%
“…(13), the attention score a i jk =1.0, leading to e i j = square(r i jk ), which is similar to Eq. (5). In this case, SDM will measure the distance between target item j and context item k in the same way as SDP model does.…”
Section: Attention Modulementioning
confidence: 99%
“…The autoencoder (AE) (Vincent et al 2008) is one of the most popular unsupervised neural network approaches. It has been widely used as a performant mechanism to pre-train neural networks and general purpose feature learning (Choi et al 2018). It allows to compress the representation of input data, disentangling the main factors of variability, removing redundancies and reducing the dimension of the input.…”
Section: Autoencodermentioning
confidence: 99%