Atsushi Nitanda scite author profile

Atsushi Nitanda

5Publications

79Citation Statements Received

329Citation Statements Given

How they've been cited

How they cite others

215

324

Affiliations

RIKEN Center for Advanced Intelligence Project, Kyushu Institute of Technology, The University of Tokyo

Publications

Order By: Most citations

Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space

Suzuki¹,

Nitanda²

2019

Preprint

View full text Add to dashboard Cite

Deep learning has exhibited superior performance for various tasks, especially for highdimensional datasets, such as images. To understand this property, we investigate the approximation and estimation ability of deep learning on anisotropic Besov spaces. The anisotropic Besov space is characterized by direction-dependent smoothness and includes several function classes that have been investigated thus far. We demonstrate that the approximation error and estimation error of deep learning only depend on the average value of the smoothness parameters in all directions. Consequently, the curse of dimensionality can be avoided if the smoothness of the target function is highly anisotropic. Unlike existing studies, our analysis does not require a low-dimensional structure of the input data. We also investigate the minimax optimality of deep learning and compare its performance with that of the kernel method (more generally, linear estimators). The results show that deep learning has better dependence on the input dimensionality if the target function possesses anisotropic smoothness, and it achieves an adaptive rate for functions with spatially inhomogeneous smoothness.

show abstract

Particle dual averaging: optimization of mean field neural network with global convergence rate analysis*

Nitanda

Suzuki

2022

J. Stat. Mech.

View full text Add to dashboard Cite

We propose the particle dual averaging (PDA) method, which generalizes the dual averaging method in convex optimization to the optimization over probability distributions with quantitative runtime guarantee. The algorithm consists of an inner loop and outer loop: the inner loop utilizes the Langevin algorithm to approximately solve for a stationary distribution, which is then optimized in the outer loop. The method can thus be interpreted as an extension of the Langevin algorithm to naturally handle nonlinear functional on the probability space. An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain. By adapting finite-dimensional convex optimization theory into the space of measures, we analyze PDA in regularized empirical/expected risk minimization, and establish quantitative global convergence in learning two-layer mean field neural networks under more general settings. Our theoretical results are supported by numerical simulations on neural networks with reasonable size.

show abstract

Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features

Yashima¹,

Nitanda²,

Suzuki³

2019

Preprint

View full text Add to dashboard Cite

Although kernel methods are widely used in many learning problems, they have poor scalability to large datasets. To address this problem, sketching and stochastic gradient methods are the most commonly used techniques to derive efficient large-scale learning algorithms. In this study, we consider solving a binary classification problem using random features and stochastic gradient descent. In recent research, an exponential convergence rate of the expected classification error under the strong low-noise condition has been shown. We extend these analyses to a random features setting, analyzing the error induced by the approximation of random features in terms of the distance between the generated hypothesis including population risk minimizers and empirical risk minimizers when using general Lipschitz loss functions, to show that an exponential convergence of the expected classification error is achieved even if random features approximation is applied. Additionally, we demonstrate that the convergence rate does not depend on the number of features and there is a significant computational benefit in using random features in classification problems because of the strong low-noise condition.

show abstract

Convex Analysis of the Mean Field Langevin Dynamics

Nitanda¹,

Wu²,

Suzuki³

2022

Preprint

View full text Add to dashboard Cite

As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics recently attracts attention due to its connection to (noisy) gradient descent on infinitely wide neural networks in the mean field regime, and hence the convergence property of the dynamics is of great theoretical interest. In this work, we give a simple and self-contained convergence rate analysis of the mean field Langevin dynamics with respect to the (regularized) objective function in both continuous and discrete time settings. The key ingredient of our proof is a proximal Gibbs distribution pq associated with the dynamics, which, in combination of techniques in Vempala and Wibisono (2019), allows us to develop a convergence theory parallel to classical results in convex optimization. Furthermore, we reveal that pq connects to the duality gap in the empirical risk minimization setting, which enables efficient empirical evaluation of the algorithm convergence.

show abstract

Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis

Nitanda

Suzuki

2020

Preprint

View full text Add to dashboard Cite

We propose the particle dual averaging (PDA) method, which generalizes the dual averaging method in convex optimization to the optimization over probability distributions with quantitative runtime guarantee. The algorithm consists of an inner loop and outer loop: the inner loop utilizes the Langevin algorithm to approximately solve for a stationary distribution, which is then optimized in the outer loop. The method can thus be interpreted as an extension of the Langevin algorithm to naturally handle nonlinear functional on the probability space. An important application of the proposed method is the optimization of two-layer neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to establish. We show that neural networks in the mean field limit can be globally optimized by PDA. Furthermore, we characterize the convergence rate by leveraging convex optimization theory in finite-dimensional spaces. Our theoretical results are supported by numerical simulations on neural networks with reasonable size.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Atsushi Nitanda

Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space

Particle dual averaging: optimization of mean field neural network with global convergence rate analysis*

Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features

Convex Analysis of the Mean Field Langevin Dynamics

Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis

Contact Info

Product

Resources

About