“…Due to its importance for the understanding of the behavior, performance, and limitations of machine learning algorithms, the study of the loss landscape of training problems for artificial neural networks has received considerable attention in the last years. Compare, for instance, with the early works [3,6,34] on this topic, with the contributions on stationary points and plateau phenomena in [1,9,15,17,50], with the results on suboptimal local minima and valleys in [11,19,24,37,41,48,52], and with the overview articles [5,45,46]. For fullyconnected feedforward neural networks involving activation functions with an affine segment, much of the research on landscape properties was initially motivated by the observation of Kawaguchi [30] that networks with linear activation functions give rise to learning problems that do not possess spurious (i.e., not globally optimal) local minima and thus behave -at least as far as the notion of local optimality is concernedlike convex problems.…”