2022
DOI: 10.48550/arxiv.2203.01238
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features

Abstract: When training deep neural networks for classification tasks, an intriguing empirical phenomenon has been widely observed in the last-layer classifiers and features, where (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero. This phenomenon is called Neural Collapse (NC), which seems to take place regardless of the choice of loss functi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
15
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(18 citation statements)
references
References 31 publications
(51 reference statements)
3
15
0
Order By: Relevance
“…Prior arts and related works on N C. The empirical N C phenomenon has inspired a recent line of theoretical studies on understanding why it occurs [17,21,44,47,71,86,87]. Like ours, most of these works studied the problem under the UFM.…”
Section: Motivations and Contributionsmentioning
confidence: 91%
See 4 more Smart Citations
“…Prior arts and related works on N C. The empirical N C phenomenon has inspired a recent line of theoretical studies on understanding why it occurs [17,21,44,47,71,86,87]. Like ours, most of these works studied the problem under the UFM.…”
Section: Motivations and Contributionsmentioning
confidence: 91%
“…Such a phenomenon is referred to as Neural Collapse (N C) [50], which has been shown empirically to persist across a broad range of canonical classification problems, on different loss functions (e.g., cross-entropy (CE) [16,50,87], mean-squared error (MSE) [71,86], and supervised contrasive (SC) losses [21]), on different neural network architectures (e.g., VGG [63], ResNet [24], and DenseNet [28]), and on a variety of standard datasets (such as MNIST [39], CIFAR [35], and ImageNet [12], etc). Recently, in independent lines of research, many works are devoted to learning maximally compact and separated features; see, e.g., [13,42,43,51,52,60,72,73,76].…”
Section: Average Ce Loss Average Accuracy No Normalizationmentioning
confidence: 99%
See 3 more Smart Citations