2020
DOI: 10.48550/arxiv.2006.14683
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MTAdam: Automatic Balancing of Multiple Training Loss Terms

Abstract: When training neural models, it is common to combine multiple loss terms. The balancing of these terms requires considerable human effort and is computationally demanding. Moreover, the optimal trade-off between the loss term can change as training progresses, especially for adversarial terms. In this work, we generalize the Adam optimization algorithm to handle multiple loss terms. The guiding principle is that for every layer, the gradient magnitude of the terms should be balanced. To this end, the Multi-Ter… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 17 publications
0
6
0
Order By: Relevance
“…Gradient direction-based methods that are designed for adapting auxiliary tasks to improve the target task, which will be detailed in Section A.1.1, including: GradSimilarity [7], GradSurgery [34], OL-AUX [17]. Multi-Task balancing methods that treat all tasks equally, which will be detailed in Section A.1.2, including: Uncertainty [14], GradNorm [4], DWA [19], MTAdam [23] and MGDA [27]. And three simple baselines.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…Gradient direction-based methods that are designed for adapting auxiliary tasks to improve the target task, which will be detailed in Section A.1.1, including: GradSimilarity [7], GradSurgery [34], OL-AUX [17]. Multi-Task balancing methods that treat all tasks equally, which will be detailed in Section A.1.2, including: Uncertainty [14], GradNorm [4], DWA [19], MTAdam [23] and MGDA [27]. And three simple baselines.…”
Section: Methodsmentioning
confidence: 99%
“…MTAdam [23] is an Adam-based optimizer that balances gradient magnitudes and then update parameters according to the rule of Adam [15]. Following MTAdam, we also directly manipulate the gradient magnitudes, instead of weighting task losses like Uncertainty [14], GradNorm [4] and DWA [19].…”
Section: A Appendix A1 Relations Of Metabalance To Previous Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…The regularization parameter 𝜆 is fixed to 5 for all experiments. The weights 𝛾 1 , 𝛾 2 , and 𝛾 3 of different losses in the objective of PROLIN (see Section 5.2.4) are adjusted dynamically during optimization using the technique in [40].…”
Section: Approachesmentioning
confidence: 99%