1999
DOI: 10.1201/9781420049176.ch6
|View full text |Cite
|
Sign up to set email alerts
|

Learning Long-Term Dependencies in NARX Recurrent Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
139
0
4

Year Published

2007
2007
2017
2017

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 104 publications
(143 citation statements)
references
References 12 publications
0
139
0
4
Order By: Relevance
“…To deal with long time lags between relevant events, several sequence processing methods were proposed, including Focused BP based on decay factors for activations of units in RNNs (Mozer, 1989(Mozer, , 1992, Time-Delay Neural Networks (TDNNs) (Lang et al, 1990) and their adaptive extension (Bodenhausen and Waibel, 1991), Nonlinear AutoRegressive with eXogenous inputs (NARX) RNNs (Lin et al, 1996), certain hierarchical RNNs (Hihi and Bengio, 1996) (compare Sec. 5.10, 1991), RL economies in RNNs with WTA units and local learning rules (Schmidhuber, 1989b), and other methods (e.g., Ring, 1993Ring, , 1994Plate, 1993;de Vries and Principe, 1991;Sun et al, 1993a;Bengio et al, 1994).…”
Section: Ideas For Dealing With Long Time Lags and Deep Capsmentioning
confidence: 99%
“…To deal with long time lags between relevant events, several sequence processing methods were proposed, including Focused BP based on decay factors for activations of units in RNNs (Mozer, 1989(Mozer, , 1992, Time-Delay Neural Networks (TDNNs) (Lang et al, 1990) and their adaptive extension (Bodenhausen and Waibel, 1991), Nonlinear AutoRegressive with eXogenous inputs (NARX) RNNs (Lin et al, 1996), certain hierarchical RNNs (Hihi and Bengio, 1996) (compare Sec. 5.10, 1991), RL economies in RNNs with WTA units and local learning rules (Schmidhuber, 1989b), and other methods (e.g., Ring, 1993Ring, , 1994Plate, 1993;de Vries and Principe, 1991;Sun et al, 1993a;Bengio et al, 1994).…”
Section: Ideas For Dealing With Long Time Lags and Deep Capsmentioning
confidence: 99%
“…The above scenario comes true for all recurrent structures. However, one can postpone vanishing of gradient in NARX recurrent neural networks with increasing the number of delays in the output delay line of this architecture [6]. As it may be seen in Fig.…”
Section: Narx Recurrent Neural Networkmentioning
confidence: 99%
“…if it is to store information for a long period of time in the presence of noise, then for a term with u << t , 0 ) ( / ) ( ® ¶ ¶ u y t y [5]. In this condition, the gradient decays exponentially [6], meaning that there is not any chance for the terms that are far from t to change the weights in such a way that allow the network's state to jump to a better basin of attraction. The above scenario comes true for all recurrent structures.…”
Section: Narx Recurrent Neural Networkmentioning
confidence: 99%
See 2 more Smart Citations