Proceedings of the Genetic and Evolutionary Computation Conference 2021
DOI: 10.1145/3449639.3459277
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing loss functions through multi-variate taylor polynomial parameterization

Abstract: Loss function optimization for neural networks has recently emerged as a new direction for metalearning, with Genetic Loss Optimization (GLO) providing a general approach for the discovery and optimization of such functions. GLO represents loss functions as trees that are evolved and further optimized using evolutionary strategies. However, searching in this space is difficult because most candidates are not valid loss functions. In this paper, a new technique, Multivariate Taylor expansion-based genetic loss-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 22 publications
(24 citation statements)
references
References 34 publications
2
17
0
Order By: Relevance
“…Metalearning, aka learning to learn, and AutoML have been applied for a wide variety of purposes as summarised in [17,21]. Of particular relevance is meta-learning of loss functions, which has been studied for various purposes including providing differentiable surrogates of nondifferentiable objectives [19], optimising efficiency and asymptotic performance of learning [22,4,18,48,11,12], and improving robustness to train/test domain-shift [3,30]. We are interested in learning white-box losses -i.e., those that can be expressed a short human-readable parametric equation -for efficiency and improved task-transferability compared to neural network alternatives [4,18,3,30], which tend to be less interpretable and need to be learned taskspecifically.…”
Section: Meta-learning Automl and Loss Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…Metalearning, aka learning to learn, and AutoML have been applied for a wide variety of purposes as summarised in [17,21]. Of particular relevance is meta-learning of loss functions, which has been studied for various purposes including providing differentiable surrogates of nondifferentiable objectives [19], optimising efficiency and asymptotic performance of learning [22,4,18,48,11,12], and improving robustness to train/test domain-shift [3,30]. We are interested in learning white-box losses -i.e., those that can be expressed a short human-readable parametric equation -for efficiency and improved task-transferability compared to neural network alternatives [4,18,3,30], which tend to be less interpretable and need to be learned taskspecifically.…”
Section: Meta-learning Automl and Loss Learningmentioning
confidence: 99%
“…We are interested in learning white-box losses -i.e., those that can be expressed a short human-readable parametric equation -for efficiency and improved task-transferability compared to neural network alternatives [4,18,3,30], which tend to be less interpretable and need to be learned taskspecifically. Meta-learning of white-box model components has been demonstrated for optimisers [47], activation functions [35], neural architectures [43] and losses for accelerating conventional supervised learning [11,12]. We are the first to demonstrate the value of automatic loss function discovery for general purpose label-noise robust learning.…”
Section: Meta-learning Automl and Loss Learningmentioning
confidence: 99%
“…TaylorGLO parameterization represents a loss function as a modified third-degree Taylor polynomial. Such a parameterization has many desirable properties, such as smoothness and continuity, that make it amenable for evolution [8]. In TaylorGAN, there are three functions that need to be optimized jointly (using the notation described in Table 1):…”
Section: The Taylorgan Approachmentioning
confidence: 99%
“…In this paper, such a technique is developed to evolve entirely new GAN formulations that outperform the standard Wasserstein loss. Leveraging the TaylorGLO loss-function parameterization approach [8], separate loss functions are constructed for the two GAN networks. A genetic algorithm is then used to optimize their parameters against two non-differentiable objectives.…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, we search over programs, which include non-neural operations and data structures, rather than just neural-network architectures, and decide what loss functions to use for training. Our work also resembles work in the AutoML community (Hutter et al, 2018) that searches in a space of programs, for example in the case of SAT solving (KhudaBukhsh et al, 2009) or auto-sklearn (Feurer et al, 2015) and concurrent work on learning loss functions to replace cross-entropy for training a fixed architecture on MNIST and CIFAR (Gonzalez & Miikkulainen, 2019;2020). Although we took inspiration from ideas in that community (Jamieson & Talwalkar, 2016;Li et al, 2016), our algorithms specify both how to compute their outputs and their own optimization objectives in order to work well in synchrony with an expensive deep RL algorithm.…”
Section: Related Workmentioning
confidence: 99%