“…Such a phenomenon is referred to as Neural Collapse (N C) [50], which has been shown empirically to persist across a broad range of canonical classification problems, on different loss functions (e.g., cross-entropy (CE) [16,50,87], mean-squared error (MSE) [71,86], and supervised contrasive (SC) losses [21]), on different neural network architectures (e.g., VGG [63], ResNet [24], and DenseNet [28]), and on a variety of standard datasets (such as MNIST [39], CIFAR [35], and ImageNet [12], etc). Recently, in independent lines of research, many works are devoted to learning maximally compact and separated features; see, e.g., [13,42,43,51,52,60,72,73,76].…”