The Multiscale Structure of Neural Network Loss Functions: The Effect on Optimization and Origin

Ma, Chao; Kunin, Daniel; Wu, Lei; Ying, Lexing

doi:10.48550/arxiv.2204.11326

Cited by 5 publications

(5 citation statements)

References 9 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The precision, recall, and F1-score metrics were reported for macro, micro, and weighted averaging. Additionally, the accuracy and loss function [16] were plotted against the number of epochs, revealing that the testing and training accuracy improved as the number of epochs increased, and the loss function decreased gradually with increasing epochs.…”

Section: Discussionmentioning

confidence: 99%

Applying Deep Learning and Machine Learning Algorithms for The Identification of Medicinal Plant Leaves Based on Their Spectral Characteristics

Tiwari,

Gupta,

Prakash

et al. 2023

J. Adv. Zool.

View full text Add to dashboard Cite

The study and consideration of medicinal plants have been ongoing throughout history due to their significant role in maintaining the well-being of mammals. Although identifying medicinal plants can be a valuable skill, it is often time-consuming, tedious, and requires the expertise of a specialist. The project works on the technique of image processing, which identifies the various medicinal plants. There has been a strong emphasis on improving efficiency through the application of technology, with a focus on incorporating digital image processing and pattern recognition techniques. To ensure accurate plant identification, proposals involving the application of computer vision neural network techniques have been advanced. This approach involves neural network models such as CNN, SVM, KNN, and Navie Bay for identifying the medical plants based on their respective features. After the validation step, the project provides a classification of 92.3 precision and 90.56 F1 score.

show abstract

Section: Discussionmentioning

confidence: 99%

Applying Deep Learning and Machine Learning Algorithms for The Identification of Medicinal Plant Leaves Based on Their Spectral Characteristics

Tiwari,

Gupta,

Prakash

et al. 2023

J. Adv. Zool.

View full text Add to dashboard Cite

show abstract

“…The SGD optimization algorithm is used to optimized the biases and weights of the proposed Bi-LSTM-based CSI estimator. In this study, the suggested estimator is learned using one of three loss functions: Hinge, 55 MSLE, 56 and KLD. 57 The loss function calculates an assessment of the difference in value between the predicted and observed outcomes.…”

Section: Offline Training Of the Proposed Bi-lstm Schemementioning

confidence: 99%

Bi‐directional LSTM based channel estimation in 5G massive MIMO OFDM systems over TDL‐C model with Rayleigh fading distribution

Shankar

2023

Int J Communication

View full text Add to dashboard Cite

SummaryIn this work, a deep learning (DL)‐based massive multiple‐input multiple‐output (mMIMO) orthogonal frequency division multiplexing (OFDM) system is investigated over the tapped delay line type C (TDL‐C) model with a Rayleigh fading distribution at frequencies ranging from 0.5 to 100 GHz. The proposed bi‐directional long short‐term memory (Bi‐LSTM) channel state information (CSI) estimator uses online learning during training and offline learning during the practical implementation phase. The design of the estimator takes into account situations in which prior knowledge of channel statistics is limited and targets excellent performance, even with limited pilot symbols (PS). Three separate loss functions (mean square logarithmic error [MSLE], Huber, and Kullback–Leibler Distance [KLD]) are assessed in three classification layers. The symbol error rate (SER) and outage probability performance of the proposed estimator are evaluated using a number of optimization techniques, such as stochastic gradient descent (SGD), momentum, and the adaptive gradient (AdaGrad) algorithm. The Bi‐LSTM‐based CSI estimator is trained considering a specific number of PS. It can be readily seen that by incorporating a cyclic prefix (CP), the system becomes more resilient to channel impairments, resulting in a lower SER. Simulations show that the SGD optimization approach and Huber loss function‐trained Bi‐LSTM‐based CSI estimator have the lowest SER and very high estimation accuracy. By using deep neural networks (DNNs), the Bi‐LSTM method for CSI estimation achieves a superior channel capacity (in bps/Hz) at 10 dB than long short‐term memory (LSTM) and other conventional CSI estimators, such as minimum mean square error (MMSE) and least squares (LS). The simulation results validate the analytical results in the study.

show abstract

“…Arora et al (2022) prove the edge of stability result occurs under certain conditions either on the learning rate or on the loss function. Ma et al (2022) empirically observe the multi-scale structure of the loss landscape in neural networks and use it to theoretically explain the edge of stability behavior of gradient descent. Chen and Bruna (2022) use low dimensional theoretical insights around a local minima to understand the edge of stability behavior.…”

Section: Edge Of Stability and The Importance Of The Hessianmentioning

confidence: 99%

On a continuous time model of gradient descent dynamics and instability in deep learning

Roșca¹,

Wu²,

Qin³

et al. 2023

Preprint

View full text Add to dashboard Cite

The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the trade-off between training stability and test set evaluation performance.

show abstract

The Multiscale Structure of Neural Network Loss Functions: The Effect on Optimization and Origin

Cited by 5 publications

References 9 publications

Applying Deep Learning and Machine Learning Algorithms for The Identification of Medicinal Plant Leaves Based on Their Spectral Characteristics

Applying Deep Learning and Machine Learning Algorithms for The Identification of Medicinal Plant Leaves Based on Their Spectral Characteristics

Bi‐directional LSTM based channel estimation in 5G massive MIMO OFDM systems over TDL‐C model with Rayleigh fading distribution

On a continuous time model of gradient descent dynamics and instability in deep learning

Contact Info

Product

Resources

About