2022
DOI: 10.48550/arxiv.2210.01245
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Random orthogonal additive filters: a solution to the vanishing/exploding gradient of deep neural networks

Abstract: Since the recognition in the early nineties of the vanishing/exploding (V/E) gradient issue plaguing the training of neural networks (NNs), significant efforts have been exerted to overcome this obstacle. However, a clear solution to the V/E issue remained elusive so far. In this manuscript a new architecture of NN is proposed, designed to mathematically prevent the V/E issue to occur. The pursuit of approximate dynamical isometry, i.e. parameter configurations where the singular values of the input-output Jac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 36 publications
0
1
0
Order By: Relevance
“…Long Short-Term Memory (LSTM) [42][43][44] is a specialized deep learning technique designed for analyzing sequential data, addressing issues found in conventional recurrent neural networks (RNNs) [45][46][47] and other machine learning algorithms. It was proposed by Hochreiter and Schmidhuber [48] to overcome the gradient vanishing problem and enhance the effectiveness of RNNs [49][50][51][52][53]. LSTM enables the retention and utilization of long-term information in a network.…”
Section: Long Short-term Memorymentioning
confidence: 99%
“…Long Short-Term Memory (LSTM) [42][43][44] is a specialized deep learning technique designed for analyzing sequential data, addressing issues found in conventional recurrent neural networks (RNNs) [45][46][47] and other machine learning algorithms. It was proposed by Hochreiter and Schmidhuber [48] to overcome the gradient vanishing problem and enhance the effectiveness of RNNs [49][50][51][52][53]. LSTM enables the retention and utilization of long-term information in a network.…”
Section: Long Short-term Memorymentioning
confidence: 99%