Saturation in PSO neural network training: Good or evil?

Rakitianskaia, Anna; Engelbrecht, Andries P.

doi:10.1109/cec.2015.7256883

Cited by 14 publications

(11 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead of using a random approach such as Latin Hypercube sampling, in the future, different deterministic and pseudo-random sampling strategies such as Sparse Grid sampling or Sobol Sequences can be employed to further improve the performance of the model. Furthermore, it is critical to obtain the statics of saturation along different parts of the solution domain during the training of DNNs (Glorot and Bengio, 2010;Rakitianskaia and Engelbrecht, 2015b). The saturation occurs when the hidden units of a DNN predominantly output values close to the asymptotic ends of the activation function range which reduces the particular PINNs model to a binary state, thus limiting the overall information capacity of the NN (Rakitianskaia and Engelbrecht, 2015a;Bai et al, 2019).…”

Section: Discussionmentioning

confidence: 99%

Physics-aware deep learning framework for linear elasticity

Roy¹,

Bose²

2023

Preprint

View full text Add to dashboard Cite

The paper presents an efficient and robust data-driven deep learning (DL) computational framework developed for linear continuum elasticity problems. The methodology is based on the fundamentals of the Physics Informed Neural Networks (PINNs). For an accurate representation of the field variables, a multi-objective loss function is proposed. It consists of terms corresponding to the residual of the governing partial differential equations (PDE), constitutive relations derived from the governing physics, various boundary conditions, and data-driven physical knowledge fitting terms across randomly selected collocation points in the problem domain. To this end, multiple densely connected independent artificial neural networks (ANNs), each approximating a field variable, are trained to obtain accurate solutions. Several benchmark problems including the Airy solution to elasticity and the Kirchhoff-Love plate problem are solved. Performance in terms of accuracy and robustness illustrates the superiority of the

show abstract

Section: Discussionmentioning

confidence: 99%

Physics-aware deep learning framework for linear elasticity

Roy¹,

Bose²

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…The sigmoid is linear around the origin, and saturates (approaches asymptotes) for inputs of large magnitude. Neuron saturation is generally undesirable, since the gradient is very weak near the asymptotes, and may cause stagnation in the training algorithms [31].…”

Section: Activation Functionsmentioning

confidence: 99%

“…Bounded activation functions such as sigmoid and hyperbolic tangent (TanH) are prone to saturation, which was shown to be detrimental to NN performance for shallow [31] and deep [12] architectures alike. Modern activation functions such as rectified linear unit (ReLU) [27] and exponential linear unit (ELU) [7] are less prone to saturation, and thus became the primary choice for deep learning [1].…”

mentioning

confidence: 99%

Empirical Loss Landscape Analysis of Neural Network Activation Functions

Bosman

Engelbrecht

Helbig

2023

Proceedings of the Companion Conference on Genetic and Evolutionary Computation

Self Cite

View full text Add to dashboard Cite

Activation functions play a significant role in neural network design by enabling non-linearity. The choice of activation function was previously shown to influence the properties of the resulting loss landscape. Understanding the relationship between activation functions and loss landscape properties is important for neural architecture and training algorithm design. This study empirically investigates neural network loss landscapes associated with hyperbolic tangent, rectified linear unit, and exponential linear unit activation functions. Rectified linear unit is shown to yield the most convex loss landscape, and exponential linear unit is shown to yield the least flat loss landscape, and to exhibit superior generalisation performance. The presence of wide and narrow valleys in the loss landscape is established for all activation functions, and the narrow valleys are shown to correlate with saturated neurons and implicitly regularised network configurations. CCS CONCEPTS• Computing methodologies → Neural networks; Continuous space search.

show abstract

“…While the previously mentioned literature discusses PSO's ability to train an ANN, none of the literature attempts to discuss why this may be. [37] hypothesized that the deficiency of PSO may be due to hidden layer saturation. [37] found that while a certain degree of saturation was required for ANN success, higher levels of saturation was found to be unsatisfactory and would lead to overfitting.…”

Section: Particle Swarm Optimizationmentioning

confidence: 99%

“…[37] hypothesized that the deficiency of PSO may be due to hidden layer saturation. [37] found that while a certain degree of saturation was required for ANN success, higher levels of saturation was found to be unsatisfactory and would lead to overfitting. [43] found that non-gradient based learning can be sensitive to the degree of saturation present in an ANN.…”

Section: Particle Swarm Optimizationmentioning

confidence: 99%