Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

Yang, Zitong; Yu, Yang; You, Chong; Steinhardt, Jacob; Ma, Yi

doi:10.48550/arxiv.2002.11328

Cited by 12 publications

(27 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sources of diversity include using different initializations [32], hyperparameters [51] or network architectures [56] for the ensemble components, or training the ensemble with additional loss terms [40,26,54]. However, under distribution shifts, reduction in performance can stem from an increase in the bias, rather than the variance term [55]. Our set of middle domains yields a more diverse ensemble by design and promotes invariance to different distortions to keep bias low (Fig.…”

Section: Related Workmentioning

confidence: 99%

Robustness via Cross-Domain Ensembles

Yeo

Kar

Zamir

2021

Preprint

View full text Add to dashboard Cite

We present a method for making neural network predictions robust to shifts from the training data distribution. The proposed method is based on making predictions via a diverse set of cues (called 'middle domains') and ensembling them into one strong prediction. The premise of the idea is that predictions made via different cues respond differently to a distribution shift, hence one should be able to merge them into one robust final prediction. We perform the merging in a straightforward but principled manner based on the uncertainty associated with each prediction. The evaluations are performed using multiple tasks and datasets (Taskonomy, Replica, ImageNet, CIFAR) under a wide range of adversarial and non-adversarial distribution shifts which demonstrate the proposed method is considerably more robust than its standard learning counterpart, conventional deep ensembles, and several other baselines.

show abstract

Section: Related Workmentioning

confidence: 99%

Robustness via Cross-Domain Ensembles

Yeo

Kar

Zamir

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Experimentation is amplified by label noise. With the observation of unimodel variance (Neal et al, 2018), (Yang et al, 2020) decomposes the risk into bias and variance, and posits that double descent arises due to the bell-shaped variance curve rising faster than the bias decreases.…”

Section: Related Workmentioning

confidence: 99%

“…In this section, we follow (Yang et al, 2020) and decompose the loss into bias and variance. Namely, let CE denote the cross entropy loss, T a random variable representing the training set, π is the true one-hot label, π is the average log-probability after normalization, and π is the output of the neural network.…”

Section: Bias Variance Decompositionmentioning

confidence: 99%

“…The bias is then computed by subtracting the empirical variance from the empirical risk. For finer details, see (Yang et al, 2020) For training, we follow (Yang et al, 2020) and train a ResNet-34 (He et al, 2016) on CIFAR10, with stochastic gradient descent (learning rate = 0.1, momentum = 0.9). The learning rate is decayed a factor of 0.1 every 200 epochs, with a weight decay of 5e-4.…”

Section: Bias Variance Decompositionmentioning

confidence: 99%

“…This results in a peak in generalization error, where a fewer number of samples would counter-intuitively decrease the error. There has been extensive experimental evidence of the double descent curve in deep learning Yang et al, 2020), as well as in models such as random forests, and one layer neural networks (Belkin et al, 2018;Ba et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Mitigating Deep Double Descent by Concatenating Inputs

Chen

Wang

Kyrillidis

2021

Proceedings of the 30th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

The double descent curve is one of the most intriguing properties of deep neural networks. It contrasts the classical bias-variance curve with the behavior of modern neural networks, occurring where the number of samples nears the number of parameters. In this work, we explore the connection between the double descent phenomena and the number of samples in the deep neural network setting. In particular, we propose a construction which augments the existing dataset by artificially increasing the number of samples. This construction empirically mitigates the double descent curve in this setting. We reproduce existing work on deep double descent, and observe a smooth descent into the overparameterized region for our construction. This occurs both with respect to the model size, and with respect to the number epochs.

show abstract

Knowing is Half the Battle: Enhancing Clean Data Accuracy of Adversarial Robust Deep Neural Networks via Dual-Model Bounded Divergence Gating

Aboutalebi,

Shafiee,

Tai

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Significant advances have been made in recent years in improving the robustness of deep neural networks, particularly under adversarial machine learning scenarios where the data has been contaminated to fool networks into making undesirable predictions. However, such improvements in adversarial robustness has often come at a significant cost in model accuracy when dealing with uncontaminated data (i.e., clean data), making such defense mechanisms challenging to adapt for real-world practical scenarios where data is primarily clean and accuracy needs to be high. Motivated to find a better balance between adversarial robustness and clean data accuracy, we propose a new model-agnostic adversarial defense mechanism named Dual-model Bounded Divergence (DBD), driven by a theoretical and empirical analysis of the bias-variance trade-off within an adversarial machine learning context. More specifically, the proposed DBD mechanism is premised on the observation that the variance in deep neural networks tends to increase in the presence of adversarial perturbations in the input data. As such, DBD employs a gating mechanism to decide on the final model prediction output based on a novel dual-model variance measure (coined DBD Variance), which is a bounded version of KL-Divergence between models. Not only is the proposed DBD mechanism itself training-free, but it can be combined with existing adversarial defense mechanisms to boost the balance between clean data accuracy and adversarial robustness. Comprehensive experimental results across over 10 different state-of-the-art adversarial defense mechanisms using both CIFAR-10 and ImageNet benchmark datasets across different adversarial attacks (e.g., APGD, AutoAttack) demonstrates that the integration of DBD can lead to as much as a 6% improvement on clean data accuracy without compromising much on adversarial robustness.

show abstract

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

Cited by 12 publications

References 16 publications

Robustness via Cross-Domain Ensembles

Robustness via Cross-Domain Ensembles

Mitigating Deep Double Descent by Concatenating Inputs

Knowing is Half the Battle: Enhancing Clean Data Accuracy of Adversarial Robust Deep Neural Networks via Dual-Model Bounded Divergence Gating

Contact Info

Product

Resources

About