2021
DOI: 10.48550/arxiv.2105.08675
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Computational Complexity of ReLU Network Training Parameterized by Data Dimensionality

Vincent Froese,
Christoph Hertrich,
Rolf Niedermeier

Abstract: Understanding the computational complexity of training simple neural networks with rectified linear units (ReLUs) has recently been a subject of intensive research. Closing gaps and complementing results from the literature, we present several results on the parameterized complexity of training two-layer ReLU networks with respect to various loss functions. After a brief discussion of other parameters, we focus on analyzing the influence of the dimension d of the training data on the computational complexity. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…For instance, in Arora et al [2018], an understanding of single layer ReLU networks enables the design of a globally optimal algorithm for solving the empirical risk minimization (ERM) problem, that runs in polynomial time in the number of data points in fixed dimension. See also Goel et al [2017Goel et al [ , 2018, Goel and Klivans [2019], , Boob et al [2020], Goel et al [2021], Froese et al [2021] for a similar lines of work.…”
Section: Related Workmentioning
confidence: 95%
“…For instance, in Arora et al [2018], an understanding of single layer ReLU networks enables the design of a globally optimal algorithm for solving the empirical risk minimization (ERM) problem, that runs in polynomial time in the number of data points in fixed dimension. See also Goel et al [2017Goel et al [ , 2018, Goel and Klivans [2019], , Boob et al [2020], Goel et al [2021], Froese et al [2021] for a similar lines of work.…”
Section: Related Workmentioning
confidence: 95%
“…However, for a full theoretical understanding of this fundamental machine learning model it is necessary to understand what functions can be exactly expressed with different NN architectures. For instance, insights about exact representability have boosted our understanding of the computational complexity of the task to train an NN with respect to both, algorithms [4,36] and hardness results [9,18,20]. It is known that a function can be expressed with a ReLU NN if and only if it is continuous and piecewise linear (CPWL) [4].…”
Section: Introductionmentioning
confidence: 99%
“…It is a well-known fact that minimizing the training error of a neural network is a computationally hard problem for a large variety of activation functions and architectures [75]. For ReLU networks, NP-hardness, parameterized hardness, and inapproximability results have been established even for the simplest possible architecture consisting of only a single ReLU neuron [15,24,32,38]. On the positive side, the seminal algorithm by Arora, Basu, Mianjy, and Mukherjee [7] solves empirical risk minimization for 2-layer ReLU networks and one-dimensional output to global optimality.…”
Section: Introductionmentioning
confidence: 99%
“…On the positive side, the seminal algorithm by Arora, Basu, Mianjy, and Mukherjee [7] solves empirical risk minimization for 2-layer ReLU networks and one-dimensional output to global optimality. It was later extended to a more general class of loss functions by Froese, Hertrich, and Niedermeier [32]. The running time is exponential in the number of neurons in the hidden layer and in the input dimension, but polynomial in the number of data points if the former two parameters are considered to be a constant.…”
Section: Introductionmentioning
confidence: 99%