2014
DOI: 10.48550/arxiv.1409.1259
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
1,014
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 1,072 publications
(1,179 citation statements)
references
References 1 publication
1
1,014
0
Order By: Relevance
“…We incentivised our model to recognise that a sepsis will start within the next 6 h. To improve upon clinical baselines, we investigated two families of classifiers: deep learning approaches and non-deep ML approaches. As for deep models, we considered a self-attention model (attn) [41] as well as a recurrent neural network employing Gated Recurrent Units (gru) [5], both of which are intrinsically capable of leveraging sequential data. Next, we included LightGBM (lgbm) [17] and a LASSO-regularised Logistic regression (lr) [39], which were given access to a total of 1,269 features that were extracted in order to make temporal dynamics governing the data accessible to these methods.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We incentivised our model to recognise that a sepsis will start within the next 6 h. To improve upon clinical baselines, we investigated two families of classifiers: deep learning approaches and non-deep ML approaches. As for deep models, we considered a self-attention model (attn) [41] as well as a recurrent neural network employing Gated Recurrent Units (gru) [5], both of which are intrinsically capable of leveraging sequential data. Next, we included LightGBM (lgbm) [17] and a LASSO-regularised Logistic regression (lr) [39], which were given access to a total of 1,269 features that were extracted in order to make temporal dynamics governing the data accessible to these methods.…”
Section: Resultsmentioning
confidence: 99%
“…In this study, we investigated a comprehensive selection of supervised ML approaches. This includes i) deep self-attention models (attn) [41] ii) recurrent neural networks employing gated recurrent units (gru) [5] iii) LightGBM gradient boosting trees (lgbm) [17], and iv) LASSO-regularised [39] logistic regression (lr).…”
Section: Methodsmentioning
confidence: 99%
“…We then performed 7 × 7 max pooling with a stride of 5 × 5. The output of the CNN was reshaped and provided as input to an RNN with a gated recurrent unit Cho et al [ 51 ] model of size 128, followed by a fully connected layer. We used the partial fine-tuning approach [ 52 ] for the tuning the CNN component, where only the affine weights of the batch normalisation layers are updated while the rest of the weights in the CNN remain frozen.…”
Section: Methodsmentioning
confidence: 99%
“…This operation requires traversing the input from the first time-step to the last one, which is computationally expensive [25]. Even though improved RNN variants such as LSTM [26] and GRU [27] can effectively reduce the difficulty of parameter updates in training, the sequential arrangement of different modal data introduces unnecessary sequential priors, which can force the model to learn an unreasonable one-way information flow while understanding the inter-modal relationships to fit the main features, affecting the effectiveness of feature extraction [28,29].…”
Section: Lite Attention Mechanismmentioning
confidence: 99%