Harmless Interpolation of Noisy Data in Regression

Muthukumar, Vidya; Vodrahalli, Kailas; Subramanian, Vignesh; Sahai, Anant

doi:10.1109/jsait.2020.2984716

Cited by 55 publications

(18 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Deng et al [24] characterize logistic regression test error using the Gaussian min-max theorem. Muthukumar et al [25] provides bounds on the risk.…”

Section: Related Workmentioning

confidence: 99%

Overparameterized Linear Regression under Adversarial Attacks

Ribeiro¹,

Schön²

2022

Preprint

View full text Add to dashboard Cite

As machine learning models start to be used in critical applications, their vulnerabilities and brittleness become a pressing concern. Adversarial attacks are a popular framework for studying these vulnerabilities. In this work, we study the error of linear regression in the face of adversarial attacks. We provide bounds of the error in terms of the traditional risk and the parameter norm and show how these bounds can be leveraged and make it possible to use analysis from nonadversarial setups to study the adversarial risk. The usefulness of these results is illustrated by shedding light on whether or not overparameterized linear models can be adversarially robust. We show that adding features to linear models might be either a source of additional robustness or brittleness. We show that these differences appear due to scaling and how the 1 and 2 norms of random projections concentrate. We also show how the reformulation we propose allows for solving adversarial training as a convex optimization problem. This is then used as a tool to study how adversarial training and other regularization methods might affect the robustness of the estimated models.

show abstract

“…Deng et al [24] characterize logistic regression test error using the Gaussian min-max theorem. Muthukumar et al [25] provides bounds on the risk.…”

Section: Related Workmentioning

confidence: 99%

Overparameterized Linear Regression under Adversarial Attacks

Ribeiro¹,

Schön²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The reluctance to overfit inhibited exploration of a range of settings where y(x) = β int , x provided optimal or near-optimal predictions. Very recently, these 'harmless interpolation' (Muthukumar, Vodrahalli, Subramanian and Sahai 2020b) or 'benign over-fitting' (Bartlett, Long, Lugosi and Tsigler 2020) regimes have become a very active direction of research, a development inspired by efforts to understand deep learning. In particular, provided a spectral characterization of models exhibiting this behaviour.…”

Section: When Do Minimum Norm Predictors Generalize?mentioning

confidence: 99%

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

Belkin¹

2021

Acta Numerica

View full text Add to dashboard Cite

In the past decade the mathematical theory of machine learning has lagged far behind the triumphs of deep neural networks on practical challenges. However, the gap between theory and practice is gradually starting to close. In this paper I will attempt to assemble some pieces of the remarkable and still incomplete mathematical mosaic emerging from the efforts to understand the foundations of deep learning. The two key themes will be interpolation and its sibling over-parametrization. Interpolation corresponds to fitting data, even noisy data, exactly. Over-parametrization enables interpolation and provides flexibility to select a suitable interpolating model.As we will see, just as a physical prism separates colours mixed within a ray of light, the figurative prism of interpolation helps to disentangle generalization and optimization properties within the complex picture of modern machine learning. This article is written in the belief and hope that clearer understanding of these issues will bring us a step closer towards a general theory of deep learning and machine learning.

show abstract

“…Recent empirical evidence has shown that certain algorithms, contrary to classical learning theory, can interpolate noisy data (achieve zero training error) while also generalizing well out of sample (low test error) [2,10,14]. We have also seen this phenomenon rigorously analyzed in theory for parametric methods such as linear regression, and random feature regression [1,3,7,9], as well as non-parametric methods such as kernel regression with singular kernels [4][5][6].…”

Section: Introductionmentioning

confidence: 99%