The Computational Complexity of ReLU Network Training Parameterized by Data Dimensionality

Froese, Vincent; Hertrich, Christoph; Niedermeier, Rolf

doi:10.48550/arxiv.2105.08675

Cited by 4 publications

(5 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For instance, in Arora et al [2018], an understanding of single layer ReLU networks enables the design of a globally optimal algorithm for solving the empirical risk minimization (ERM) problem, that runs in polynomial time in the number of data points in fixed dimension. See also Goel et al [2017Goel et al [ , 2018, Goel and Klivans [2019], , Boob et al [2020], Goel et al [2021], Froese et al [2021] for a similar lines of work.…”

Section: Related Workmentioning

confidence: 95%

Towards Lower Bounds on the Depth of ReLU Neural Networks

Hertrich¹,

Basu²,

Summa³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

We contribute to a better understanding of the class of functions that is represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning tasks. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). This problem has potential impact on algorithmic and statistical aspects because of the insight it provides into the class of functions represented by neural hypothesis classes. However, to the best of our knowledge, this question has not been investigated in the neural network literature. We also present upper bounds on the sizes of neural networks required to represent functions in these neural hypothesis classes.

show abstract

Section: Related Workmentioning

confidence: 95%

Towards Lower Bounds on the Depth of ReLU Neural Networks

Hertrich¹,

Basu²,

Summa³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, for a full theoretical understanding of this fundamental machine learning model it is necessary to understand what functions can be exactly expressed with different NN architectures. For instance, insights about exact representability have boosted our understanding of the computational complexity of the task to train an NN with respect to both, algorithms [4,36] and hardness results [9,18,20]. It is known that a function can be expressed with a ReLU NN if and only if it is continuous and piecewise linear (CPWL) [4].…”

Section: Introductionmentioning

confidence: 99%

ReLU Neural Networks of Polynomial Size for Exact Maximum Flow Computation

Hertrich

Sering

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

This paper studies the expressive power of artificial neural networks with rectified linear units. In order to study them as a model of real-valued computation, we introduce the concept of Max-Affine Arithmetic Programs and show equivalence between them and neural networks concerning natural complexity measures. We then use this result to show that two fundamental combinatorial optimization problems can be solved with polynomial-size neural networks. First, we show that for any undirected graph with n nodes, there is a neural network (with fixed weights and biases) of size O(n 3 ) that takes the edge weights as input and computes the value of a minimum spanning tree of the graph. Second, we show that for any directed graph with n nodes and m arcs, there is a neural network of size O(m 2 n 2 ) that takes the arc capacities as input and computes a maximum flow. Our results imply that these two problems can be solved with strongly polynomial time algorithms that solely uses affine transformations and maxima computations, but no comparison-based branchings.

show abstract

“…It is a well-known fact that minimizing the training error of a neural network is a computationally hard problem for a large variety of activation functions and architectures [75]. For ReLU networks, NP-hardness, parameterized hardness, and inapproximability results have been established even for the simplest possible architecture consisting of only a single ReLU neuron [15,24,32,38]. On the positive side, the seminal algorithm by Arora, Basu, Mianjy, and Mukherjee [7] solves empirical risk minimization for 2-layer ReLU networks and one-dimensional output to global optimality.…”

Section: Introductionmentioning

confidence: 99%

“…On the positive side, the seminal algorithm by Arora, Basu, Mianjy, and Mukherjee [7] solves empirical risk minimization for 2-layer ReLU networks and one-dimensional output to global optimality. It was later extended to a more general class of loss functions by Froese, Hertrich, and Niedermeier [32]. The running time is exponential in the number of neurons in the hidden layer and in the input dimension, but polynomial in the number of data points if the former two parameters are considered to be a constant.…”

Section: Introductionmentioning

confidence: 99%

Training Fully Connected Neural Networks is $\exists\mathbb{R}$-Complete

Bertschinger¹,

Hertrich²,

Jungeblut³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

We consider the algorithmic problem of finding the optimal weights and biases for a twolayer fully connected neural network to fit a given set of data points. This problem is known as empirical risk minimization in the machine learning community. We show that the problem is ∃R-complete. This complexity class can be defined as the set of algorithmic problems that are polynomial-time equivalent to finding real roots of a polynomial with integer coefficients. Our results hold even if the following restrictions are all added simultaneously.• There are exactly two output neurons.• There are exactly two input neurons.• The data has only 13 different labels.• The number of hidden neurons is a constant fraction of the number of data points.• The target training error is zero.• The ReLU activation function is used.

show abstract

The Computational Complexity of ReLU Network Training Parameterized by Data Dimensionality

Cited by 4 publications

References 11 publications

Towards Lower Bounds on the Depth of ReLU Neural Networks

Towards Lower Bounds on the Depth of ReLU Neural Networks

ReLU Neural Networks of Polynomial Size for Exact Maximum Flow Computation

Training Fully Connected Neural Networks is $\exists\mathbb{R}$-Complete

Contact Info

Product

Resources

About