MTAdam: Automatic Balancing of Multiple Training Loss Terms

Malkiel, Itzik; Wolf, Lior

doi:10.48550/arxiv.2006.14683

Cited by 3 publications

(6 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Gradient direction-based methods that are designed for adapting auxiliary tasks to improve the target task, which will be detailed in Section A.1.1, including: GradSimilarity [7], GradSurgery [34], OL-AUX [17]. Multi-Task balancing methods that treat all tasks equally, which will be detailed in Section A.1.2, including: Uncertainty [14], GradNorm [4], DWA [19], MTAdam [23] and MGDA [27]. And three simple baselines.…”

Section: Methodsmentioning

confidence: 99%

“…MTAdam [23] is an Adam-based optimizer that balances gradient magnitudes and then update parameters according to the rule of Adam [15]. Following MTAdam, we also directly manipulate the gradient magnitudes, instead of weighting task losses like Uncertainty [14], GradNorm [4] and DWA [19].…”

Section: A Appendix A1 Relations Of Metabalance To Previous Methodsmentioning

confidence: 99%

“…Multi-Task Balancing Methods. In multi-task learning, methods have been proposed to balance the joint learning of all tasks to avoid the situation where one or more tasks have a dominant influence on the network weight [4,14,19,23,27]. Although these methods have no special preference to the target task (as in our focus in this paper), we do discuss their connection to MetaBalance in Section A.1.2 (in Appendix) and experimentally compare with them in Section 5.…”

Section: Related Workmentioning

confidence: 99%

“…Finally, instead of using current magnitudes ∥G 𝑡 𝑡𝑎𝑟 ∥ and ∥G 𝑡 𝑎𝑢𝑥,𝑖 ∥ in Algorithm 1, following [23], we apply the moving average of magnitude of the corresponding gradient to take into account the variance among all gradient magnitudes over the training iterations:…”

Section: Adjusting Magnitude Proximitymentioning

confidence: 99%

See 3 more Smart Citations

MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks

He,

Feng,

Cheng

et al. 2022

Preprint

View full text Add to dashboard Cite

In many personalized recommendation scenarios, the generalization ability of a target task can be improved via learning with additional auxiliary tasks alongside this target task on a multitask network. However, this method often suffers from a serious optimization imbalance problem. On the one hand, one or more auxiliary tasks might have a larger influence than the target task and even dominate the network weights, resulting in worse recommendation accuracy for the target task. On the other hand, the influence of one or more auxiliary tasks might be too weak to assist the target task. More challenging is that this imbalance dynamically changes throughout the training process and varies across the parts of the same network. We propose a new method: MetaBalance to balance auxiliary losses via directly manipulating their gradients w.r.t the shared parameters in the multi-task network. Specifically, in each training iteration and adaptively for each part of the network, the gradient of an auxiliary loss is carefully reduced or enlarged to have a closer magnitude to the gradient of the target loss, preventing auxiliary tasks from being so strong that dominate the target task or too weak to help the target task. Moreover, the proximity between the gradient magnitudes can be flexibly adjusted to adapt MetaBalance to different scenarios. The experiments show that our proposed method achieves a significant improvement of 8.34% in terms of NDCG@10 upon the strongest baseline on two real-world datasets. The code of our approach can be found at here. 1

show abstract

Section: Methodsmentioning

confidence: 99%

Section: A Appendix A1 Relations Of Metabalance To Previous Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Adjusting Magnitude Proximitymentioning

confidence: 99%

See 2 more Smart Citations

MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks

He,

Feng,

Cheng

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The regularization parameter 𝜆 is fixed to 5 for all experiments. The weights 𝛾 1 , 𝛾 2 , and 𝛾 3 of different losses in the objective of PROLIN (see Section 5.2.4) are adjusted dynamically during optimization using the technique in [40].…”

Section: Approachesmentioning

confidence: 99%

Privacy-preserving and bandwidth-efficient federated learning

Kerkouche

Ács²,

Castelluccia

et al. 2021

Proceedings of the Conference on Health, Inference, and Learning

View full text Add to dashboard Cite

Machine Learning, and in particular Federated Machine Learning, opens new perspectives in terms of medical research and patient care. Although Federated Machine Learning improves over centralized Machine Learning in terms of privacy, it does not provide provable privacy guarantees. Furthermore, Federated Machine Learning is quite expensive in term of bandwidth consumption as it requires participant nodes to regularly exchange large updates. This paper proposes a bandwidth-efficient privacy-preserving Federated Learning that provides theoretical privacy guarantees based on Differential Privacy. We experimentally evaluate our proposal for in-hospital mortality prediction using a real dataset, containing Electronic Health Records of about one million patients. Our results suggest that strong and provable patient-level privacy can be enforced at the expense of only a moderate loss of prediction accuracy. CCS CONCEPTS• Security and privacy; • Computing methodologies → Machine learning;

show abstract

On Training Derivative-Constrained Neural Networks

View full text Add to dashboard Cite

We refer to the setting where the (partial) derivatives of a neural network's (NN's) predictions with respect to its inputs are used as additional training signal as a derivative-constrained (DC) NN. This situation is common in physics-informed settings in the natural sciences. We propose an integrated RELU (IReLU) activation function to improve training of DC NNs.We also investigate unit rescaling to help stabilize DC training. We evaluate our methods on physics-informed settings including quantum chemistry and Scientific Machine Learning (SciML) tasks. We demonstrate that existing architectures with IReLU activations combined with unit rescaling better incorporate training signal provided by derivative constraints. v 1 We are ignoring symmetries. Technically, U : R 3A−5 → R for general atomistic systems and U : R 3A−6 → R when it is planar.

show abstract

MTAdam: Automatic Balancing of Multiple Training Loss Terms

Cited by 3 publications

References 17 publications

MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks

MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks

Privacy-preserving and bandwidth-efficient federated learning

On Training Derivative-Constrained Neural Networks

Contact Info

Product

Resources

About