2020
DOI: 10.48550/arxiv.2002.11328
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
25
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(27 citation statements)
references
References 16 publications
2
25
0
Order By: Relevance
“…Sources of diversity include using different initializations [32], hyperparameters [51] or network architectures [56] for the ensemble components, or training the ensemble with additional loss terms [40,26,54]. However, under distribution shifts, reduction in performance can stem from an increase in the bias, rather than the variance term [55]. Our set of middle domains yields a more diverse ensemble by design and promotes invariance to different distortions to keep bias low (Fig.…”
Section: Related Workmentioning
confidence: 99%
“…Sources of diversity include using different initializations [32], hyperparameters [51] or network architectures [56] for the ensemble components, or training the ensemble with additional loss terms [40,26,54]. However, under distribution shifts, reduction in performance can stem from an increase in the bias, rather than the variance term [55]. Our set of middle domains yields a more diverse ensemble by design and promotes invariance to different distortions to keep bias low (Fig.…”
Section: Related Workmentioning
confidence: 99%
“…Experimentation is amplified by label noise. With the observation of unimodel variance (Neal et al, 2018), (Yang et al, 2020) decomposes the risk into bias and variance, and posits that double descent arises due to the bell-shaped variance curve rising faster than the bias decreases.…”
Section: Related Workmentioning
confidence: 99%
“…In this section, we follow (Yang et al, 2020) and decompose the loss into bias and variance. Namely, let CE denote the cross entropy loss, T a random variable representing the training set, π is the true one-hot label, π is the average log-probability after normalization, and π is the output of the neural network.…”
Section: Bias Variance Decompositionmentioning
confidence: 99%
See 2 more Smart Citations