2014
DOI: 10.48550/arxiv.1406.1078
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
2,723
0
14

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 2,824 publications
(2,741 citation statements)
references
References 9 publications
4
2,723
0
14
Order By: Relevance
“…Later, variants of RNNs are designed for dealing with the vanishing gradient problem, for example, LSTM, 6 GRU. 7 As the third category, Convolutional Neural Networks (CNNs) are initially designed for two-dimensional data, for example, image processing and have achieved a great success for visual tasks, for example, image classification and object detection. Later on, 1D CNNs are proposed for time series, which keep the parallel training ability of convolutions and the strong learning ability.…”
Section: Deep Neural Networkmentioning
confidence: 99%
“…Later, variants of RNNs are designed for dealing with the vanishing gradient problem, for example, LSTM, 6 GRU. 7 As the third category, Convolutional Neural Networks (CNNs) are initially designed for two-dimensional data, for example, image processing and have achieved a great success for visual tasks, for example, image classification and object detection. Later on, 1D CNNs are proposed for time series, which keep the parallel training ability of convolutions and the strong learning ability.…”
Section: Deep Neural Networkmentioning
confidence: 99%
“…Since the seminal work of [19] was proposed, the task of VQA has attracted much research attention.The current VQA framework is mainly composed of a question feature extractor, an image feature extractor, and multi-modal fusion. The question feature extraction usually uses Long Short-Term Memory(LSTM) [20], Gated Recurrent Units (GRU) [21], and Skip-thought vectors [22]. The mainstream image feature extraction method is to use Faster R-CNN [23] instead of the traditional CNN, so that the task is connected with the object detection to focus on the salient regions of the image related to the question [24].…”
Section: A Visual Question Answeringmentioning
confidence: 99%
“…Words are represented by 300-dimensional GloVe word embedding D = {w 1 , w 2 ...w n } ∈ R d h * n , d h = 300 denotes the dimension of each word representation. Finally, the word vectors are fed to the Gated Recurrent Units (GRU) [21] network to encode the question embedding Q = {q 1 , q 2 ...q n } ∈ R ds * n , d s = 1024 is the dimension of each hidden state in GRU.…”
Section: A Feature Extractionmentioning
confidence: 99%
“…a diagnosis code) from the cooccurrence information without considering the temporal sequential nature of EHR data. Furthermore, considering both long-term dependency and sequential information, recurrent neural networks [13], [14], [15], [16], including LSTM [18] and GRU [19], are used to learn the contextualized representation of EHR data. However, even predictive systems based on these algorithms still perform far below human capabilities, and cannot effectively improve care for individual patients.…”
Section: Introductionmentioning
confidence: 99%