Fast convergence rates of deep neural networks for classification

Kim, Yongdai; Ohn, Ilsang; Kim, Dongha

doi:10.1016/j.neunet.2021.02.012

Cited by 42 publications

(55 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When q = 0 (high noise), the convergence rate with respect to the sample size is n −α/(0.75d+2α) , which is exactly the same as Modified Logistic or exponential examples; When q = +∞ (no noise), the rate will be significantly improved to n −1/2 . A similar result can be found in Theorem 3.3 of Kim et al (2021).…”

Section: Example 4: Svmsupporting

confidence: 83%

“…When the inputs are assumed to be uniformly distributed on the surface of a sphere, Kalai et al (2008) derived non-asymptotic bounds for efficient binary prediction with half spaces by minimizing the misclassification error rate directly. Kim et al (2021) studied the excess risk of empirical risk minimizer for classification under the hinge loss (SVM) using deep neural networks. They aimed to establish the convergence rate under the Tsybakov noise condition (Mammen and Tsybakov, 1999;Tsybakov, 2004) in three different cases: smooth decision boundary, smooth conditional class probability η and margin conditions.…”

Section: Error Bounds In Regression and Classificationmentioning

confidence: 99%

“…They aimed to establish the convergence rate under the Tsybakov noise condition (Mammen and Tsybakov, 1999;Tsybakov, 2004) in three different cases: smooth decision boundary, smooth conditional class probability η and margin conditions. However, Kim et al (2021) focused on the use of deep feedforward neural networks, instead of deep CNNs, and placed restrictions on the shapes of the networks where the network is only allowed to have O(log n) depth and O(n ν ) width for some ν > 0. Their convergence rate results did not clearly describe the prefactors in the error bounds and they also did not address the conditions for circumventing the curse of dimensionality.…”

Section: Error Bounds In Regression and Classificationmentioning

confidence: 99%

See 2 more Smart Citations

Non-asymptotic Excess Risk Bounds for Classification with Deep Convolutional Neural Networks

Shen¹,

Jiao²,

Lin³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we consider the problem of binary classification with a class of general deep convolutional neural networks, which includes fully-connected neural networks and fully convolutional neural networks as special cases. We establish non-asymptotic excess risk bounds for a class of convex surrogate losses and target functions with different modulus of continuity. An important feature of our results is that we clearly define the prefactors of the risk bounds in terms of the input data dimension and other model parameters and show that they depend polynomially on the dimensionality in some important models. We also show that the classification methods with CNNs can circumvent the curse of dimensionality if the input data is supported on an approximate low-dimensional manifold. To establish these results, we derive an upper bound for the covering number for the class of general convolutional neural networks with a bias term in each convolutional layer, and derive new results on the approximation power of CNNs for any uniformly-continuous target functions. These results provide further insights into the complexity and the approximation power of general convolutional neural networks, which are of independent interest and may have other applications. Finally, we apply our general results to analyze the non-asymptotic excess risk bounds for four widely used methods with different loss functions using CNNs, including the least squares, the logistic, the exponential and the SVM hinge losses. 1

show abstract

Section: Example 4: Svmsupporting

confidence: 83%

Section: Error Bounds In Regression and Classificationmentioning

confidence: 99%

Section: Error Bounds In Regression and Classificationmentioning

confidence: 99%

See 1 more Smart Citation

Non-asymptotic Excess Risk Bounds for Classification with Deep Convolutional Neural Networks

Shen¹,

Jiao²,

Lin³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Schmidt-Hieber (2020) shows minimax-optimality of s-sparse neural networks for regression over Hölder classes, where at most s = O(n log n) network weights are nonzero, and n = the number of training samples. Kim et al (2021) extends the results of Schmidt-Hieber (2020) to the classification setting, remarking that effective optimization under sparsity constraint is lacking. Kohler & Langer (2020) and Langer (2021) proved minimax-optimality without the sparsity assumption, however in an underparametrized setting.…”

Section: Related Worksupporting

confidence: 57%

VC dimension of partially quantized neural networks in the overparametrized regime

Wang¹,

Scott²

2021

Preprint

View full text Add to dashboard Cite

Vapnik-Chervonenkis (VC) theory has so far been unable to explain the small generalization error of overparametrized neural networks. Indeed, existing applications of VC theory to large networks obtain upper bounds on VC dimension that are proportional to the number of weights, and for a large class of networks, these upper bound are known to be tight. In this work, we focus on a class of partially quantized networks that we refer to as hyperplane arrangement neural networks (HANNs). Using a sample compression analysis, we show that HANNs can have VC dimension significantly smaller than the number of weights, while being highly expressive. In particular, empirical risk minimization over HANNs in the overparametrized regime achieves the minimax rate for classification with Lipschitz posterior class probability. We further demonstrate the expressivity of HANNs empirically. On a panel of 121 UCI datasets, overparametrized HANNs match the performance of state-of-the-art full-precision models.

show abstract

“…Our result is the first adaptive optimal one for the fully connected (i.e., non-sparse) neural network models. Many of the related studies showing minimax optimality of neural network estimators assumed sparsity of the neural networks (in both adaptive and nonadaptive fashions) to control estimation variance while maintain high approximation ability, for example, Schmidt-Hieber [2020], Suzuki [2019], Imaizumi and Fukumizu [2020], Ohn and Kim [2020] and Kim et al [2021]. However, to find an optimal sparse neural network is computationally expensive since we need to explore all possible zero-one configurations of the network parameter.…”

Section: Oracle Contraction Rate For Regression Function Estimationmentioning

confidence: 99%

Adaptive variational Bayes: Optimality, computation and applications

Ohn¹,

Lin²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

In this paper, we explore adaptive inference based on variational Bayes. Although a number of studies have been conducted to analyze contraction properties of variational posteriors, there is still a lack of a general and computationally tractable variational Bayes method that can achieve adaptive optimal contraction of the variational posterior. We propose a novel variational Bayes framework, called adaptive variational Bayes, which can operate on a collection of models with varying dimensions and structures. The proposed framework combines variational posteriors over individual models with certain weights to obtain a variational posterior over the entire model. It turns out that this combined variational posterior minimizes the Kullback-Leibler divergence to the original posterior distribution. We show that the proposed variational posterior achieves optimal contraction rates adaptively under very general conditions and attains model selection consistency when the true model structure exists. We apply the general results obtained for the adaptive variational Bayes to several examples including deep learning models and derive some new and adaptive inference results. Moreover, we consider the use of quasi-likelihood in our framework. We formulate conditions on the quasi-likelihood to ensure the adaptive optimality and discuss specific applications to stochastic block models and nonparametric regression with sub-Gaussian errors.

show abstract

Fast convergence rates of deep neural networks for classification

Cited by 42 publications

References 19 publications

Non-asymptotic Excess Risk Bounds for Classification with Deep Convolutional Neural Networks

Non-asymptotic Excess Risk Bounds for Classification with Deep Convolutional Neural Networks

VC dimension of partially quantized neural networks in the overparametrized regime

Adaptive variational Bayes: Optimality, computation and applications

Contact Info

Product

Resources

About