2021
DOI: 10.7717/peerj-cs.338
|View full text |Cite
|
Sign up to set email alerts
|

Multi-objective simulated annealing for hyper-parameter optimization in convolutional neural networks

Abstract: In this study, we model a CNN hyper-parameter optimization problem as a bi-criteria optimization problem, where the first objective being the classification accuracy and the second objective being the computational complexity which is measured in terms of the number of floating point operations. For this bi-criteria optimization problem, we develop a Multi-Objective Simulated Annealing (MOSA) algorithm for obtaining high-quality solutions in terms of both objectives. CIFAR-10 is selected as the benchmark datas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(10 citation statements)
references
References 23 publications
0
6
0
1
Order By: Relevance
“…However, the proposed model is not built and scaled for the real work problems with larger datasets such as network data. Multi-objective simulated annealing (MOSA) [21] algorithm that efficiently searches the objective space outperforming the simulated annealing (SA) algorithm with a caveat that the computational complexity is as important as the test accuracy. Hoopes et al [22] proposed HyperMorph, a learning-based strategy that eliminates the need to tune hyperparameters during training, reducing the computational time but limits the capability to find the optimal values.…”
Section: Related Workmentioning
confidence: 99%
“…However, the proposed model is not built and scaled for the real work problems with larger datasets such as network data. Multi-objective simulated annealing (MOSA) [21] algorithm that efficiently searches the objective space outperforming the simulated annealing (SA) algorithm with a caveat that the computational complexity is as important as the test accuracy. Hoopes et al [22] proposed HyperMorph, a learning-based strategy that eliminates the need to tune hyperparameters during training, reducing the computational time but limits the capability to find the optimal values.…”
Section: Related Workmentioning
confidence: 99%
“…A first set of quality metrics is related to the resulting Pareto front. Here, hypervolume is the most widely used; see Garrido and Hernández (2019) (Chatelain et al, 2007), the average distance (or Generational Distance) of the front to a reference set (such as the approximated true Pareto front obtained by exhaustive search, see Smithson et al (2016); or an aggregated front, see Gülcü and Kuş (2021)), a coverage measure computed as the percentage of the solutions of an algorithm A dominated by the solutions of another algorithm B (Juang & Hsu, 2014;H. Li, Zhang, Tsang, & Ford, 2004), or metrics based on the shape of the Pareto front (Abdolsh et al, 2019) or its diversity (Juang & Hsu, 2014;H.…”
Section: Quality Metrics For Comparing Multi-objective Hpo Algorithmsmentioning
confidence: 99%
“…Li et al, 2004). The latter can be computed using the spacing and the spread of the solutions: spacing evaluates the diversity of the Pareto points along a given front (Gülcü & Kuş, 2021), whereas spread evaluates the range of the objective function values (see Zitzler, Deb, and Thiele (2000)). Some authors use performance measures that do not relate to the quality of the front obtained; e.g., execution time (Horn et al, 2017;Parsa et al, 2019;Richter et al, 2016), number of performance evaluations (Parsa et al, 2019), CPU utilization in parallel computer architectures (Richter et al, 2016), measures that were not considered as an objective and that are evaluated in the Pareto solutions (usually, confusion matrix-based measures for classification problems; see Salt et al (2019)), or measures that are specific for the HPO algorithm used (e.g., the number of new points suggested per batch is used by Gupta, Shilton, Rana, and Venkatesh (2018) to evaluate the performance of the search executed during batch Bayesian optimization).…”
Section: Quality Metrics For Comparing Multi-objective Hpo Algorithmsmentioning
confidence: 99%
See 1 more Smart Citation
“…The main advantage of 1D-CNN is automatic feature extraction performed through its initial convolutional layers [10,13,34]. However, CNN has a high computational cost, and its architecture design is a difficult task [35].…”
Section: Introductionmentioning
confidence: 99%