“…2 Inspired by the deterministic model 2 and multigrid methods, 24 Trask et al showed that a natural extension of POU-Net, the probabilistic partition of unity network (PPOU-Net), can be interpreted as a mixture of experts (MoE) model, and proposed an expectation-maximization (EM) training strategy as well as a hierarchical architecture to accelerate and improve the conditioning of the training process. 1,25 Classical approximation methods enjoy the advantages of computational efficiency and convergence guarantee in solving local, low-dimensional regression problems, but they often struggle in high dimensions or as global approximants. 26 Examples of such classical methods include truncated expansions in orthogonal polynomials (e.g., Chebyshev polynomials, Legendre polynomials, Hermite polynomials) 27 and Fourier basis functions 28 , rational functions 29 , radial basis functions 30 , splines 31 , wavelets 32 , kernel methods 33 , sparse grid 34 , etc.…”