2023
DOI: 10.48550/arxiv.2302.06675
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Symbolic Discovery of Optimization Algorithms

Abstract: We present a method to formulate algorithm discovery as program search, and apply it to discover optimization algorithms for deep neural network training. We leverage efficient search techniques to explore an infinite and sparse program space. To bridge the large generalization gap between proxy and target tasks, we also introduce program selection and simplification strategies. Our method discovers a simple and effective optimization algorithm, Lion (EvoLved Sign Momentum). It is more memory-efficient than Ad… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 33 publications
(40 citation statements)
references
References 28 publications
0
25
0
Order By: Relevance
“…• Another question is, why not use an optimizer such as Lion [7] which does not divide updates by any value, and is therefore immune to the stuck-in-the-past scenario. We believe this may be a promising path forward.…”
Section: E Stableadamw Continued E1 Qandamentioning
confidence: 99%
See 1 more Smart Citation
“…• Another question is, why not use an optimizer such as Lion [7] which does not divide updates by any value, and is therefore immune to the stuck-in-the-past scenario. We believe this may be a promising path forward.…”
Section: E Stableadamw Continued E1 Qandamentioning
confidence: 99%
“…We study these two directions in the context of contrastive language-image pre-training (CLIP) [44]. We examine CLIP-style models because of their importance in computer vision: CLIP-style models reach state-ofthe-art performance on a wide range of image classification tasks [44,63,42,7] and underlie image generation methods such as DALLE•2 [47] and Stable Diffusion [49]. Our contributions towards fast training and stable training are as follows.…”
Section: Introductionmentioning
confidence: 99%
“…Precisions and optimizers. In Table 18, we show that sufficiently pre-trained EVA-02 representations are robust enough that can be fine-tuned using various numerical precisions (e.g., fp16 and bf16) and optimizers (e.g., Lion [25], AdamW [64,84], and SGD [87]). Remarkably, the fine-tuning can be done using the SGD optimizer with only little performance drop.…”
Section: A2 Additional Results For Image Classificationmentioning
confidence: 99%
“…The final IN-1K fine-tuning for all-sized models (including EVA-02-Ti and -S) can be done without using strong regularization such as cutmix [141], mixup [143] and random erasing [146]. In the Appendix, we show that our pre-trained representations are robust enough that can be fine-tuned using various numerical precisions (e.g., fp16 and bf16) and optimizers (e.g., Lion [25], AdamW [64,84], and SGD [87]). Remarkably, the fine-tuning can be done even using the SGD optimizer with only 0.1-point performance drop.…”
Section: Image Classificationmentioning
confidence: 98%
“…By default, AdamW [61], a variant of Adam which decouples the L 2 regularization and the weight decay, is the most widely used optimizer for Transformers. More recently, Google searches optimization algorithms and discovers a simple and effective optimizer called Lion [18]. Lion only keeps track of the momentum with the first-order gradient, and its update only considers the sign direction and has the same magnitude for each parameter, which is very different from the adaptive optimizers like AdamW.…”
Section: Optimizationmentioning
confidence: 99%