2021
DOI: 10.48550/arxiv.2106.02073
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path

Abstract: Recent work [Papyan, Han, and Donoho, 2020] discovered a phenomenon called Neural Collapse (NC) that occurs pervasively in today's deep net training paradigm of driving cross-entropy loss towards zero. In this phenomenon, the last-layer features collapse to their class-means, both the classifiers and class-means collapse to the same Simplex Equiangular Tight Frame (ETF), and the behavior of the last-layer classifier converges to that of the nearest-class-mean decision rule. Since then, follow-ups-such as Mixon… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(11 citation statements)
references
References 19 publications
(45 reference statements)
0
10
0
Order By: Relevance
“…While most existing papers consider cross-entropy loss, in this paper we focus on the mean squared error (MSE) loss, which has been recently shown to be powerful also for classification tasks (Hui & Belkin, 2020). (We note that the occurrence of neural collapse when training practical DNNs with MSE loss, and its positive effects on their performance, have been shown empirically in a very recent paper (Han et al, 2021)). We start with analyzing the (plain) UFM, showing that for the regularized MSE loss the collapsed features can be more structured than in the cross-entropy case (e.g., they may possess also orthogonality), which affects also the structure of the weights.…”
Section: Introductionmentioning
confidence: 99%
“…While most existing papers consider cross-entropy loss, in this paper we focus on the mean squared error (MSE) loss, which has been recently shown to be powerful also for classification tasks (Hui & Belkin, 2020). (We note that the occurrence of neural collapse when training practical DNNs with MSE loss, and its positive effects on their performance, have been shown empirically in a very recent paper (Han et al, 2021)). We start with analyzing the (plain) UFM, showing that for the regularized MSE loss the collapsed features can be more structured than in the cross-entropy case (e.g., they may possess also orthogonality), which affects also the structure of the weights.…”
Section: Introductionmentioning
confidence: 99%
“…Like ours, most of these works studied the problem under the UFM. In particular, despite the nonconvexity, recent works showed that the only global solutions are N C solutions for a variety of nonconvex training losses (e.g., CE [17,44,87], MSE [71,86], SC losses [21]) and different problem formulations (e.g., penalized, constrained, and unconstrained) [23,44,71,86,87]. Recently, this study has been extended to deeper models with the MSE training loss [71].…”
Section: Motivations and Contributionsmentioning
confidence: 96%
“…More specifically, we prove that every local minimizer is a global solution satisfying the N C properties, and all the other critical points exhibit directions with negative curvature. Our analysis for the manifold setting is based upon a nontrivial extension of recent studies for the N C with penalized formulations [23,71,86,87], which could be of independent interest. Our work brings new tools from Riemannian optimization for analyzing optimization landscapes of training deep networks with an increasingly common practice of feature normalization.…”
Section: Motivations and Contributionsmentioning
confidence: 99%
See 1 more Smart Citation
“…Robustness of learned representations against label noise. Recently, a line of work showed an intriguing and universal phenomenon of learned deep representations under natural setting [90][91][92][93], that the last-layer representations of each class collapse to a single dimension. However, the collapsed representation loses the variability of the data and is vulnerable to corruptions such as label noise.…”
Section: Limitations and Future Directionsmentioning
confidence: 99%