“…Most existing approximation theories for deep neural networks so far focus on the approximation rate in the number of parameters W (Cybenko, 1989;Hornik, Stinchcombe, & White, 1989;Barron, 1993;Liang & Srikant, 2016;Yarotsky, 2017Yarotsky, , 2018Poggio, Mhaskar, Rosasco, Miranda, & Liao, 2017;Weinan & Wang, 2018;Petersen & Voigtlaender, 2018;Chui, Lin, & Zhou, 2018;Nakada & Imaizumi, 2019;Gribonval, Kutyniok, Nielsen, & Voigtlaender, 2019;Gühring, Kutyniok, & Petersen, 2019;Chen, Jiang, Liao, & Zhao, 2019;Li, Lin, & Shen, 2019;Suzuki, 2019;Bao et al, 2019;Opschoor, Schwab, & Zech, 2019;Yarotsky & Zhevnerchuk, 2019;Bölcskei, Grohs, Kutyniok, & Petersen, 2019;Montanelli & Du, 2019;Chen & Wu, 2019;Zhou, 2020;Montanelli & Yang, 2020;Montanelli, Yang, & Du, in press). From the point of view of theoretical difficulty, controlling two variables, N and L, in our theory is more challenging than controlling one variable W in the literature.…”