Recurrent Neural Networks 2008
DOI: 10.5772/5542
|View full text |Cite
|
Sign up to set email alerts
|

Application of Recurrent Neural Networks to Rainfall-runoff Processes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2011
2011
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 29 publications
0
2
0
Order By: Relevance
“…Static positional encoding calculates gradients by accumulation and is only related to sequence length and word vector standard deviation. RNNs calculate gradients by continuous multiplication at almost all positions during back propagation and the gradients are close to 0 [17], it may face the problem of vanishing gradient. For one transformer block, the standard deviation of word vector is usually of the order of 0.02 [4], and for other blocks, LayerNormalization ensures that the word vector standard deviation is of the order of 1 [17].…”
Section: Cumsum Calculationmentioning
confidence: 99%
See 1 more Smart Citation
“…Static positional encoding calculates gradients by accumulation and is only related to sequence length and word vector standard deviation. RNNs calculate gradients by continuous multiplication at almost all positions during back propagation and the gradients are close to 0 [17], it may face the problem of vanishing gradient. For one transformer block, the standard deviation of word vector is usually of the order of 0.02 [4], and for other blocks, LayerNormalization ensures that the word vector standard deviation is of the order of 1 [17].…”
Section: Cumsum Calculationmentioning
confidence: 99%
“…RNNs calculate gradients by continuous multiplication at almost all positions during back propagation and the gradients are close to 0 [17], it may face the problem of vanishing gradient. For one transformer block, the standard deviation of word vector is usually of the order of 0.02 [4], and for other blocks, LayerNormalization ensures that the word vector standard deviation is of the order of 1 [17]. The position parameter of cumsum calculation can always maintain a reasonable gradient.…”
Section: Cumsum Calculationmentioning
confidence: 99%