2020
DOI: 10.48550/arxiv.2005.08054
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Classification vs regression in overparameterized regimes: Does the loss function matter?

Abstract: We compare classification and regression tasks in the overparameterized linear model with Gaussian features. On the one hand, we show that with sufficient overparameterization all training points are support vectors: solutions obtained by least-squares minimum-norm interpolation, typically used for regression, are identical to those produced by the hard-margin support vector machine (SVM) that minimizes the hinge loss, typically used for training classifiers. On the other hand, we show that there exist regimes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
70
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
2

Relationship

3
5

Authors

Journals

citations
Cited by 20 publications
(70 citation statements)
references
References 22 publications
0
70
0
Order By: Relevance
“…Overparameterized ML and double-descent The phenomenon of double-descent was first discovered by [6]. This paper and subsequent works on this topic [4,33,32,30,11] emphasize the importance of the right prior (sometimes referred to as inductive bias or regularization) to avail the benefits of overparameterization. However, an important question that arises is: where does this prior come from?…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Overparameterized ML and double-descent The phenomenon of double-descent was first discovered by [6]. This paper and subsequent works on this topic [4,33,32,30,11] emphasize the importance of the right prior (sometimes referred to as inductive bias or regularization) to avail the benefits of overparameterization. However, an important question that arises is: where does this prior come from?…”
Section: Related Workmentioning
confidence: 99%
“…Recent literature in ML theory has posited that overparameterization can be beneficial to generalization in traditional single-task setups for both regression [29,40,4,33,30] and classification [32,31] problems. Empirical literature in deep learning suggests that overparameterization is of interest for both phases of meta-learning as well.…”
Section: Introductionmentioning
confidence: 99%
“…We next analyze the classification error of the minimum-Hilbert-norm interpolator of binary labels via a bias-variance decomposition inspired by the work of [17]. We demonstrate an asymptotic separation between the regression and classification tasks akin to that provided in [15]. Although good regression performance implies good classification performance (see Figure 1(b)), the reverse is not true; we characterize regimes in which classification is statistically consistent but regression is not for the case of bounded orthonormal systems.…”
Section: Asymptotic Separation Between Kernel Classification and Regr...mentioning
confidence: 99%
“…In a related line of work, [15], [16] show that the max-margin support-vector-machine can be classification-consistent in the overparametrized setting even when the corresponding regression task does not generalize. These results require even stricter assumptions; in fact, they require independence of the features used in the linear model in the very first step of obtaining sharp expressions for the classification generalization error.…”
Section: Introductionmentioning
confidence: 99%
“…Han et al [26] uncovered that the "neural collapse" phenomenon also occurs under square loss where the last-layer features eventually collapse to their simplex-style class-means. Muthukumar et al [27] compared classification and regression tasks in the overparameterized linear model with Gaussian features, illustrating different roles and properties of loss functions used at the training and testing phases. Poggio and Liao [28] made interesting observations on effects of popular regularization techniques such as batch normalization and weight decay on the gradient flow dynamics under square loss.…”
mentioning
confidence: 99%