Abstract. It is known that ab initio molecular dynamics based on the electron ground-state eigenvalue can be used to approximate quantum observables in the canonical ensemble when the temperature is low compared to the first electron eigenvalue gap. This work proves that a certain weighted average of the different ab initio dynamics, corresponding to each electron eigenvalue, approximates quantum observables for any temperature. The proof uses the semiclassical Weyl law to show that canonical quantum observables of nuclei-electron systems, based on matrix-valued Hamiltonian symbols, can be approximated by ab initio molecular dynamics with the error proportional to the electron-nuclei mass ratio. The result covers observables that depend on time correlations. A combination of the Hilbert-Schmidt inner product for quantum operators and Weyl's law shows that the error estimate holds for observables and Hamiltonian symbols that have three and five bounded derivatives, respectively, provided the electron eigenvalues are distinct for any nuclei position and the observables are in the diagonal form with respect to the electron eigenstates.
Estimates of the generalization error are proved for a residual neural network with $L$ random Fourier features layers $\bar z_{\ell +1}=\bar z_\ell + \textrm {Re}\sum _{k=1}^K\bar b_{\ell k}\,e^{\textrm {i}\omega _{\ell k}\bar z_\ell }+ \textrm {Re}\sum _{k=1}^K\bar c_{\ell k}\,e^{\textrm {i}\omega ^{\prime}_{\ell k}\cdot x}$. An optimal distribution for the frequencies $(\omega _{\ell k},\omega ^{\prime}_{\ell k})$ of the random Fourier features $e^{\textrm {i}\omega _{\ell k}\bar z_\ell }$ and $e^{\textrm {i}\omega ^{\prime}_{\ell k}\cdot x}$ is derived. This derivation is based on the corresponding generalization error for the approximation of the function values $f(x)$. The generalization error turns out to be smaller than the estimate ${\|\hat f\|^2_{L^1({\mathbb {R}}^d)}}/{(KL)}$ of the generalization error for random Fourier features, with one hidden layer and the same total number of nodes $KL$, in the case of the $L^\infty $-norm of $f$ is much less than the $L^1$-norm of its Fourier transform $\hat f$. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network. Promising performance of the proposed new algorithm is demonstrated in computational experiments.
The supervised learning problem to determine a neural network approximation R d ∋ x → K k=1 βk e iω k •x with one hidden layer is studied as a random Fourier features algorithm. The Fourier features, i.e., the frequencies ω k ∈ R d , are sampled using an adaptive Metropolis sampler. The Metropolis test accepts proposal frequencies ω ′ k , having corresponding amplitudes β′ k , with the probability min 1, (| β′ k |/| βk |) γ , for a certain positive parameter γ, determined by minimizing the approximation error for given computational work. This adaptive, non-parametric stochastic method leads asymptotically, as K → ∞, to equidistributed amplitudes | βk |, analogous to deterministic adaptive algorithms for differential equations. The equidistributed amplitudes are shown to asymptotically correspond to the optimal density for independent samples in random Fourier features methods. Numerical evidence is provided in order to demonstrate the approximation properties and efficiency of the proposed algorithm. The algorithm is tested both on synthetic data and a real-world high-dimensional benchmark.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.