Pratik Chaudhari scite author profile

This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based objective function that favors well-generalizable solutions lying in large flat regions of the energy landscape, while avoiding poorly-generalizable solutions located in the sharp valleys. Conceptually, our algorithm resembles two nested loops of SGD where we use Langevin dynamics in the inner loop to compute the gradient of the local entropy before each update of the weights. We show that the new objective has a smoother energy landscape and show improved generalization over SGD using uniform stability, under certain assumptions. Our experiments on convolutional and recurrent networks demonstrate that Entropy-SGD compares favorably to state-of-the-art techniques in terms of generalization error and training time.

show abstract

Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks

Chaudhari

Soatto

2018

163

237

View full text Add to dashboard Cite

Stochastic gradient descent (SGD) is widely believed to perform implicit regularization when used to train deep neural networks, but the precise manner in which this occurs has thus far been elusive. We prove that SGD minimizes an average potential over the posterior distribution of weights along with an entropic regularization term. This potential is however not the original loss function in general. So SGD does perform variational inference, but for a different loss than the one used to compute the gradients. Even more surprisingly, SGD does not even converge in the classical sense: we show that the most likely trajectories of SGD for deep networks do not behave like Brownian motion around critical points. Instead, they resemble closed loops with deterministic components. We prove that such "out-of-equilibrium" behavior is a consequence of highly non-isotropic gradient noise in SGD; the covariance matrix of mini-batch gradients for deep networks has a rank as small as 1% of its dimension. We provide extensive empirical validation of these claims, proven in the appendix.

show abstract

Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

Chaudhari¹,

Choromanska²,

Soatto³

et al. 2016

Preprint

132

View full text Add to dashboard Cite

Deep relaxation: partial differential equations for optimizing deep neural networks

et al. 2018

View full text Add to dashboard Cite

In this paper we establish a connection between non-convex optimization methods for training deep neural networks and nonlinear partial differential equations (PDEs). Relaxation techniques arising in statistical physics which have already been used successfully in this context are reinterpreted as solutions of a viscous Hamilton-Jacobi PDE. Using a stochastic control interpretation allows we prove that the modified algorithm performs better in expectation that stochastic gradient descent. Well-known PDE regularity results allow us to analyze the geometry of the relaxed energy landscape, confirming empirical evidence. The PDE is derived from a stochastic homogenization problem, which arises in the implementation of the algorithm. The algorithms scale well in practice and can effectively tackle the high dimensionality of modern neural networks.

show abstract

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

Chaudhari¹,

Soatto²

2017

Preprint

View full text Add to dashboard Cite

Evaluation of Machine Learning Models for Classifying Upper Extremity Exercises Using Inertial Measurement Unit-Based Kinematic Data

Hua

Chaudhari

Johnson

et al. 2020

IEEE J. Biomed. Health Inform.

View full text Add to dashboard Cite

The amount of home-based exercise prescribed by a physical therapist is difficult to monitor. However, the integration of wearable inertial measurement unit (IMU) devices can aid in monitoring home exercise by analyzing exercise biomechanics. The objective of this study is to evaluate machine learning models for classifying nine different upper extremity exercises, based upon kinematic data captured from an IMU-based device. Fifty participants performed one compound and eight isolation exercises with their right arm. Each exercise was performed ten times for a total of 4500 trials. Joint angles were calculated using IMUs that were placed on the hand, forearm, upper arm, and torso. Various machine learning models were developed with different algorithms and train-test splits. Random forest models with flattened kinematic data as a feature had the greatest accuracy (98.6%). Using triaxial joint range of motion as the feature set resulted in decreased accuracy (91.9%) with faster speeds. Accuracy did not decrease below 90% until training size was decreased to 5% from 50%. Accuracy decreased (88.7%) when splitting data by participant. Upper extremity exercises can be classified accurately using kinematic data from a wearable IMU device. A random forest classification model was developed that quickly and accurately classified exercises. Sampling frequency and lower training splits had a modest effect on Manuscript

show abstract

Deep Relaxation: partial differential equations for optimizing deep neural networks

Chaudhari¹,

Oberman²,

Osher³

et al. 2017

Preprint

View full text Add to dashboard Cite

Incremental sampling-based algorithm for minimum-violation motion planning

Castro

Chaudhari

Tůmová

et al. 2013

View full text Add to dashboard Cite

Abstract-This paper studies the problem of control strategy synthesis for dynamical systems with differential constraints to fulfill a given reachability goal while satisfying a set of safety rules. Particular attention is devoted to goals that become feasible only if a subset of the safety rules are violated. The proposed algorithm computes a control law, that minimizes the level of unsafety while the desired goal is guaranteed to be reached. This problem is motivated by an autonomous car navigating an urban environment while following rules of the road such as "always travel in right lane" and "do not change lanes frequently". Ideas behind sampling based motionplanning algorithms, such as Probabilistic Road Maps (PRMs) and Rapidly-exploring Random Trees (RRTs), are employed to incrementally construct a finite concretization of the dynamics as a durational Kripke structure. In conjunction with this, a weighted finite automaton that captures the safety rules is used in order to find an optimal trajectory that minimizes the violation of safety rules. We prove that the proposed algorithm guarantees asymptotic optimality, i.e., almost-sure convergence to optimal solutions. We present results of simulation experiments and an implementation on an autonomous urban mobility-on-demand system.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.