2014
DOI: 10.48550/arxiv.1409.0473
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Neural Machine Translation by Jointly Learning to Align and Translate

Abstract: Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and encode a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

10
4,453
0
7

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 3,479 publications
(4,988 citation statements)
references
References 15 publications
10
4,453
0
7
Order By: Relevance
“…The outputs of our CNN are passed into a bidirectional LSTM (16 hidden units, tanh activation function) network as a recurrent sequence learning model. Then, the LSTM outputs are passed to the attention network [10] to produce an attention score (e t ) for each attended frame (h t ): e t = h t w a , where w a represents attention layer weight matrix. From e t , an importance attention weight (a t ) is computed for each attended frame:…”
Section: Model Architecturementioning
confidence: 99%
“…The outputs of our CNN are passed into a bidirectional LSTM (16 hidden units, tanh activation function) network as a recurrent sequence learning model. Then, the LSTM outputs are passed to the attention network [10] to produce an attention score (e t ) for each attended frame (h t ): e t = h t w a , where w a represents attention layer weight matrix. From e t , an importance attention weight (a t ) is computed for each attended frame:…”
Section: Model Architecturementioning
confidence: 99%
“…The attention mechanism was first proposed in the field of natural language processing (NLP) to align the input sequence and the output sequence [52]. Working with the LSTM-based encoder-decoder architecture, the attention mechanism has revealed great power in capturing long-range dependencies in sequences.…”
Section: B Attention Mechanismmentioning
confidence: 99%
“…For the forecasting, we apply the encoder-decoder model considering the search query data using an attention mechanism [5].…”
Section: Model Structurementioning
confidence: 99%
“…Owing to the rapid development of neural networks, many models have been based on these, particularly convolutional neural networks (CNNs) [38,6,11] and recurrent neural networks (RNNs) [41,40,25], which capture the temporal variation. In recent years, there has been an increase in time-series prediction models using "attention" (Transformer) to achieve state-of-the-art performance in multiple natural language processing applications [5,42]. Attention generally aggregates temporal features using dynamically generated weights, thereby enabling the network to focus on significant time steps in the past directly.…”
Section: B Related Workmentioning
confidence: 99%
See 1 more Smart Citation