2019
DOI: 10.48550/arxiv.1911.10651
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Trajectory growth lower bounds for random sparse deep ReLU networks

Abstract: This paper considers the growth in the length of one-dimensional trajectories as they are passed through deep ReLU neural networks, which, among other things, is one measure of the expressivity of deep networks. We generalise existing results, providing an alternative, simpler method for lower bounding expected trajectory growth through random networks, for a more general class of weights distributions, including sparsely connected networks. We illustrate this approach by deriving bounds for sparse-Gaussian, s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 5 publications
(5 reference statements)
0
3
0
Order By: Relevance
“…This may be done by considering a set of inputs lying along a curve and measuring the length of the corresponding curve of outputs. It has been claimed in prior literature that in a ReLU network this length distortion grows exponentially with the network's depth [24,23]. We prove that, in fact, the expected length distortion does not grow at all with depth.…”
Section: Introductionmentioning
confidence: 60%
See 2 more Smart Citations
“…This may be done by considering a set of inputs lying along a curve and measuring the length of the corresponding curve of outputs. It has been claimed in prior literature that in a ReLU network this length distortion grows exponentially with the network's depth [24,23]. We prove that, in fact, the expected length distortion does not grow at all with depth.…”
Section: Introductionmentioning
confidence: 60%
“…Average-case analyses for ReLU networks in [24,23] showed that the expected length distortion can grow exponentially with depth, while [22] presented a similar analysis for the curvature of output trajectories. However, these results rely upon initializing the weights of the network with a distribution that also leads to exponentially large outputs [14] and exploding gradients [11], both of which are avoided by the standard "He normal" initialization [17].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation