2018
DOI: 10.48550/arxiv.1806.00468
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Implicit Bias of Gradient Descent on Linear Convolutional Networks

Abstract: We show that gradient descent on full width linear convolutional networks of depth L converges to a linear predictor related to the 2/L bridge penalty in the frequency domain. This is in contrast to fully connected linear networks, where regardless of depth, gradient descent converges to the 2 maximum margin solution.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
53
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(55 citation statements)
references
References 9 publications
2
53
0
Order By: Relevance
“…Deep linear networks (FCNs and CNNs) similar to our CNN toy example have been studied in the literature [4,16,20,37]. These studies use different approaches and assumptions and do not discuss the target shift mechanism which applies also for non-linear CNNs.…”
Section: A Additional Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Deep linear networks (FCNs and CNNs) similar to our CNN toy example have been studied in the literature [4,16,20,37]. These studies use different approaches and assumptions and do not discuss the target shift mechanism which applies also for non-linear CNNs.…”
Section: A Additional Related Workmentioning
confidence: 99%
“…The results are shown in Fig 1 where we compare the theoretical predictions given by the solutions of the self consistent equation (16) to the empirical values of α obtained by training actual CNNs and averaging their outputs across the ensemble. As n grows, the two converge to the identity line (dashed black line).…”
Section: Numerical Verificationmentioning
confidence: 99%
“…Efforts to explain the effectiveness of gradient descent in deep learning have uncovered an exciting possibility: it not only finds solutions with low error, but also biases the search for low complexity solutions which generalize well (Zhang et al, 2017;Bartlett et al, 2017;Soudry et al, 2017;Gunasekar et al, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…In defiance of the classical bias-variance trade-off, the performance of these interpolating classifiers continuously improves as the number of parameters increases well beyond the number of training samples [3][4][5][6]. Despite recent progress in describing the implicit bias of stochastic gradient descent towards "good" minima [7][8][9][10][11][12], and the detailed analysis of solvable models of learning [13][14][15][16][17][18][19][20][21][22][23][24], the mechanisms underlying this "benign overfitting" [25] in DNNs remain partially unclear, especially since "bad" local minima exist in the optimisation landscape of DNNs [26].…”
Section: Introductionmentioning
confidence: 99%