“…However, deeper networks require much less neurons to reach the same expressive power, yielding a potential theoretical explanation of the dominance of deep networks in practice [7,29,42,44,53,62,65,68,79,80,83]. Other related work includes counting and bounding the number of linear regions [43,59,60,64,65,74], classifying the set of functions exactly representable by different architectures [7,23,46,47,61,86], or analyzing the memorization capacity of ReLU networks [82,84,85].…”