Nonconvex Sparse Regularization for Deep Neural Networks and Its Optimality

Ohn, Ilsang; Kim, Yongdai

doi:10.1162/neco_a_01457

Cited by 9 publications

(20 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where g(x) = x(1 − x). Applying ( 4) and (7), it is then shown in [10], Lemma A.2, that there is a network from F(m + 4, p) with p ∞ = 6, that for a given input (x, y) approximates the product xy with error 2 −m . As it follows from (4), ( 5) and ( 6), the set of parameters in the construction of that network that does not belong to {0, ± 1 2 , ±1} consists of • shift coordinates ±2 −k , k = 2, ..., 2m + 1;…”

Section: Proofsmentioning

confidence: 99%

“…Using this entropy bound, it is then shown in [10], that if the regression function is a composition of Hölder smooth functions, then sparse neural networks with depth log 2 n, width n t 2β+t and the number of non-zero parameters ∼ n t 2β+t log 2 n, where β > 0 and t ≥ 1 depend on the structure and the smoothness of the regression function, attain the minimax optimal prediction error rate n −2β 2β+t (up to a logarithmic factor). Entropy bounds for the spaces of neural networks with certain l 1 -related regularizations are provided in [7] and [11] and their derivation is also based on the sparsity induced by the imposed constraints. In particular, in [7] the above l 0 regularization is replaced by the clipped l 1 norm regularization with sufficiently small clipping threshold.…”

Section: Introductionmentioning

confidence: 99%

“…Entropy bounds for the spaces of neural networks with certain l 1 -related regularizations are provided in [7] and [11] and their derivation is also based on the sparsity induced by the imposed constraints. In particular, in [7] the above l 0 regularization is replaced by the clipped l 1 norm regularization with sufficiently small clipping threshold. Networks with l 1 norm of all parameters bounded 1 are considered in [11].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Function approximation by deep neural networks with parameters $\{0,\pm \frac{1}{2}, \pm 1, 2\}$

Beknazaryan¹

2021

Preprint

View full text Add to dashboard Cite

In this paper it is shown that C β -smooth functions can be approximated by neural networks with parameters {0, ± 1 2 , ±1, 2}. The depth, width and the number of active parameters of constructed networks have, up to a logarithimc factor, the same dependence on the approximation error as the networks with parameters in [−1, 1]. In particular, this means that the nonparametric regression estimation with constructed networks attain the same convergence rate as with the sparse networks with parameters in [−1, 1].

show abstract

Section: Proofsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Function approximation by deep neural networks with parameters $\{0,\pm \frac{1}{2}, \pm 1, 2\}$

Beknazaryan¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In nonparametric regression estimation we aim to recover an unknown d-variate function g 0 based on n observed input-output pairs (X i , Y i ) ∈ R d × R, i = 1, ..., n. Various regression estimating function classes, including wavelets, polynomials, splines and kernel estimates have been studied in the literature (see, e.g., [2], [5], [6], [7] and references therein). Along with the development of practical and theoretical applications of neural networks, regression estimations with neural networks are becoming popular in the recent literature (see, e.g., [1], [8], [9], [10], [13], [15], [18], [19], [21] and references therein). Usually a class of neural networks with properly chosen architecture and with weight vectors belonging to some regularized set W n is determined and the estimator ĝn of g 0 is selected to be either the regularized empirical risk minimizer…”

Section: Introductionmentioning

confidence: 99%

“…(i) deriving prediction rates of the empirical risk minimizers (1) or ( 2); (ii) finding an optimization algorithm that identifies the corresponding empirical risk minimizers. Convergence rates of empirical risk minimizers (ERM) over the classes of deep ReLU networks are studied in [4], [13], [15] and [18]. In [4] it is shown that the ERM of the form (1), with W n being the set of weight vectors with coordinates {0, ±1/2, ±1, 2}, attains, up to logarithmic factors, the minimax rates of prediction of β-smooth functions.…”

Section: Introductionmentioning

confidence: 99%

Nonparametric regression with modified ReLU networks

Beknazaryan¹,

Sang²

2022

Preprint

View full text Add to dashboard Cite

We consider regression estimation with modified ReLU neural networks in which network weight matrices are first modified by a function α before being multiplied by input vectors. We give an example of continuous, piecewise linear function α for which the empirical risk minimizers over the classes of modified ReLU networks with l 1 and squared l 2 penalties attain, up to a logarithmic factor, the minimax rate of prediction of unknown β-smooth function.

show abstract

Sparse-penalized deep neural networks estimator under weak dependence

Kengne,

Wade

2024

Metrika

View full text Add to dashboard Cite

Nonconvex Sparse Regularization for Deep Neural Networks and Its Optimality

Cited by 9 publications

References 15 publications

Function approximation by deep neural networks with parameters $\{0,\pm \frac{1}{2}, \pm 1, 2\}$

Function approximation by deep neural networks with parameters $\{0,\pm \frac{1}{2}, \pm 1, 2\}$

Nonparametric regression with modified ReLU networks

Sparse-penalized deep neural networks estimator under weak dependence

Contact Info

Product

Resources

About