In this study, we provide a review on the meta-heuristic methods like Genetic Algorithms, Particle Swarm Optimization, Differential Evolution and Bayes Optimization that have been used extensively to optimize hyper-parameters in Convolutional Neural Networks (CNN). We highlight the hyper-parameters that have been selected to be optimized in those studies along with the value domains of those parameters. These studies reveal that the number of layers, number of kernels and size of those kernels at each layer, learning rate and the batch size are among the hyper-parameters that affect the performance of the CNNs the most. Figure A. structure of convolutional neural networks Purpose: In this study, meta-heuristic methods that have been used to optimize convolutional neural networks are investigated. A performance comparison of these methods on different image datasets has been presented. The advantages and disadvantages of the optimization approaches have been presented with the aim of providing the user important points that should be considered during hyper-parameter selection process. Results: The definiton of "the best" set of hyper-parameters in convolutional neural networks depends on the problem or in this case, on the dataset. But it is clear from the studies that the selection of some parameters directly affect the performance of the networks. Number of layers, number of filters in each layer and size of each filter, regularization method, learning rate and batch size are among the most important parameters. It is easy to conclude that Genetic Algorithms (GA) are the most widely studied techniques used in hyper-parameter optimizaton. This is due to the fact that they yield successful results in most of the studies. While selecting the optimization method, one should consider the size of the problem, available computational budget and time. In addition, accuracy expectations should also be taken into account. For the problems with small hyper-parameter search space, methods like Grid Search would be sufficient, but for the problems with large search space, meta-heuristic methods would be more convenient. Conclusion: In this study, the effect of hyper-parameter optimization methods on classification performance is investigated. GA and Particle Swarm Optimization (PSO) methods are the two most-widely used meta-heuristics for hyper-parameter optimization. The computational burden of these methods can be justified with the accuracy improvement achieved with them. If the computational resources are limited, and it is desired to obtain good results in reasonable amount of time, then other methods like TPE and SMAC would be good choices.
The success of Convolutional Neural Networks is highly dependent on the selected architecture and the hyper-parameters. The need for the automatic design of the networks is especially important for complex architectures where the parameter space is so large that trying all possible combinations is computationally infeasible. In this study, Microcanonical Optimization algorithm which is a variant of Simulated Annealing method is used for hyper-parameter optimization and architecture selection for Convolutional Neural Networks. To the best of our knowledge, our study provides a first attempt at applying Microcanonical Optimization for this task. The networks generated by the proposed method is compared to the networks generated by Simulated Annealing method in terms of both accuracy and size using six widely-used image recognition datasets. Moreover, a performance comparison using Tree Parzen Estimator which is a Bayesion optimization-based approach is also presented. It is shown that the proposed method is able to achieve competitive classification results with the state-of-the-art architectures. When the size of the networks is also taken into account, one can see that the networks generated by Microcanonical Optimization method contain far less parameters than the state-of-the-art architectures. Therefore, the proposed method can be preferred for automatically tuning the networks especially in situations where fast training is as important as the accuracy. INDEX TERMS Convolutional neural networks, hyper-parameter optimization, microcanonical optimization, tree Parzen estimator.
In this study, we model a CNN hyper-parameter optimization problem as a bi-criteria optimization problem, where the first objective being the classification accuracy and the second objective being the computational complexity which is measured in terms of the number of floating point operations. For this bi-criteria optimization problem, we develop a Multi-Objective Simulated Annealing (MOSA) algorithm for obtaining high-quality solutions in terms of both objectives. CIFAR-10 is selected as the benchmark dataset, and the MOSA trade-off fronts obtained for this dataset are compared to the fronts generated by a single-objective Simulated Annealing (SA) algorithm with respect to several front evaluation metrics such as generational distance, spacing and spread. The comparison results suggest that the MOSA algorithm is able to search the objective space more effectively than the SA method. For each of these methods, some front solutions are selected for longer training in order to see their actual performance on the original test set. Again, the results state that the MOSA performs better than the SA under multi-objective setting. The performance of the MOSA configurations are also compared to other search generated and human designed state-of-the-art architectures. It is shown that the network configurations generated by the MOSA are not dominated by those architectures, and the proposed method can be of great use when the computational complexity is as important as the test accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.