Stage-based Hyper-parameter Optimization for Deep Learning

Shin, Ahnjae; Shin, Dongjin; Cho, Sung‐Woo; Kim, Do Yoon; Jeong, Eunji; Yu, Gyeong-In; Chun, Byung-Gon

doi:10.48550/arxiv.1911.10504

Cited by 2 publications

(1 citation statement)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The DNN model Shin et al, [2] will constitute a nested architecture of layers in which the number of parameters is in millions. The deep model's high degrees of freedom will enable it to be approximate non-linear as well as linear functions; despite that, this model is constantly at the risk of overfitting to training data.…”

Section: Introductionmentioning

confidence: 99%

Hyperparameter Tuning for Deep Neural Networks Based Optimization Algorithm

Vidyabharathi¹,

Mohanraj²

2023

Intelligent Automation &Amp; Soft Computing

View full text Add to dashboard Cite

For training the present Neural Network (NN) models, the standard technique is to utilize decaying Learning Rates (LR). While the majority of these techniques commence with a large LR, they will decay multiple times over time. Decaying has been proved to enhance generalization as well as optimization.Other parameters, such as the network's size, the number of hidden layers, dropouts to avoid overfitting, batch size, and so on, are solely based on heuristics. This work has proposed Adaptive Teaching Learning Based (ATLB) Heuristic to identify the optimal hyperparameters for diverse networks. Here we consider three architectures Recurrent Neural Networks (RNN), Long Short Term Memory (LSTM), Bidirectional Long Short Term Memory (BiLSTM) of Deep Neural Networks for classification. The evaluation of the proposed ATLB is done through the various learning rate schedulers Cyclical Learning Rate (CLR), Hyperbolic Tangent Decay (HTD), and Toggle between Hyperbolic Tangent Decay and Triangular mode with Restarts (T-HTR) techniques. Experimental results have shown the performance improvement on the 20Newsgroup, Reuters Newswire and IMDB dataset.

show abstract

Section: Introductionmentioning

confidence: 99%