Guido Montúfar scite author profile

This paper explores the complexity of deep feedforward networks with linear presynaptic couplings and rectified linear activations. This is a contribution to the growing body of work contrasting the representational power of deep and shallow network architectures. In particular, we offer a framework for comparing deep and shallow models that belong to the family of piecewise linear functions based on computational geometry. We look at a deep rectifier multi-layer perceptron (MLP) with linear outputs units and compare it with a single layer version of the model. In the asymptotic regime, when the number of inputs stays constant, if the shallow model has kn hidden units and n 0 inputs, then the number of linear regions is O(k n0 n n0 ). For a k layer model with n hidden units on each layer it is Ω( n/n 0 k−1 n n0 ). The number n/n 0 k−1 grows faster than k n0 when n tends to infinity or when k tends to infinity and n ≥ 2n 0 . Additionally, even when k is small, if we restrict n to be 2n 0 , we can show that a deep model has considerably more linear regions that a shallow one. We consider this as a first step towards understanding the complexity of these models and specifically towards providing suitable mathematical tools for future analysis.

show abstract

Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines

Montúfar

2011

Neural Computation

View full text Add to dashboard Cite

show abstract

Natural gradient via optimal transport

Montúfar

2018

Info. Geo.

View full text Add to dashboard Cite

We study a natural Wasserstein gradient flow on manifolds of probability distributions with discrete sample spaces. We derive the Riemannian structure for the probability simplex from the dynamical formulation of the Wasserstein distance on a weighted graph. We pull back the geometric structure to the parameter space of any given probability model, which allows us to define a natural gradient flow there. In contrast to the natural Fisher-Rao gradient, the natural Wasserstein gradient incorporates a ground metric on sample space. We illustrate the analysis of elementary exponential family examples and demonstrate an application of the Wasserstein natural gradient to maximum likelihood estimation.

show abstract

Ricci curvature for parametric statistics via optimal transport

Montúfar

2018

Preprint

View full text Add to dashboard Cite

When Does a Mixture of Products Contain a Product of Mixtures?

Montúfar¹,

Morton²

2015

SIAM J. Discrete Math.

View full text Add to dashboard Cite

We derive relations between theoretical properties of restricted Boltzmann machines (RBMs), popular machine learning models which form the building blocks of deep learning models, and several natural notions from discrete mathematics and convex geometry. We give implications and equivalences relating RBM-representable probability distributions, perfectly reconstructible inputs, Hamming modes, zonotopes and zonosets, point configurations in hyperplane arrangements, linear threshold codes, and multi-covering numbers of hypercubes. As a motivating application, we prove results on the relative representational power of mixtures of product distributions and products of mixtures of pairs of product distributions (RBMs) that formally justify widely held intuitions about distributed representations. In particular, we show that a mixture of products requiring an exponentially larger number of parameters is needed to represent the probability distributions which can be obtained as products of mixtures.

show abstract

A Theory of Cheap Control in Embodied Systems

Montúfar

Ghazi-Zahedi

2015

PLoS Comput Biol

View full text Add to dashboard Cite

We present a framework for designing cheap control architectures of embodied agents. Our derivation is guided by the classical problem of universal approximation, whereby we explore the possibility of exploiting the agent’s embodiment for a new and more efficient universal approximation of behaviors generated by sensorimotor control. This embodied universal approximation is compared with the classical non-embodied universal approximation. To exemplify our approach, we present a detailed quantitative case study for policy models defined in terms of conditional restricted Boltzmann machines. In contrast to non-embodied universal approximation, which requires an exponential number of parameters, in the embodied setting we are able to generate all possible behaviors with a drastically smaller model, thus obtaining cheap universal approximation. We test and corroborate the theory experimentally with a six-legged walking machine. The experiments indicate that the controller complexity predicted by our theory is close to the minimal sufficient value, which means that the theory has direct practical implications.

show abstract

Restricted Boltzmann Machines: Introduction and Review

Montúfar

2018

View full text Add to dashboard Cite

The restricted Boltzmann machine is a network of stochastic units with undirected interactions between pairs of visible and hidden units. This model was popularized as a building block of deep learning architectures and has continued to play an important role in applied and theoretical machine learning. Restricted Boltzmann machines carry a rich structure, with connections to geometry, applied algebra, probability, statistics, machine learning, and other areas. The analysis of these models is attractive in its own right and also as a platform to combine and generalize mathematical tools for graphical models with hidden variables. This article gives an introduction to the mathematical analysis of restricted Boltzmann machines, reviews recent results on the geometry of the sets of probability distributions representable by these models, and suggests a few directions for further investigation.

show abstract

Evaluating Morphological Computation in Muscle and DC-Motor Driven Models of Hopping Movements

et al. 2016

View full text Add to dashboard Cite

In the context of embodied artificial intelligence, morphological computation refers to processes, which are conducted by the body (and environment) that otherwise would have to be performed by the brain. Exploiting environmental and morphological properties are an important feature of embodied systems. The main reason is that it allows to significantly reduce the controller complexity. An important aspect of morphological computation is that it cannot be assigned to an embodied system per se, but that it is, as we show, behavior and state dependent. In this work, we evaluate two different measures of morphological computation that can be applied in robotic systems and in computer simulations of biological movement. As an example, these measures were evaluated on muscle and DC-motor driven hopping models. We show that a state-dependent analysis of the hopping behaviors provides additional insights that cannot be gained from the averaged measures alone. This work includes algorithms and computer code for the measures.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Guido Montúfar

On the number of response regions of deep feed forward networks with piece-wise linear activations

Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines

Natural gradient via optimal transport

Ricci curvature for parametric statistics via optimal transport

When Does a Mixture of Products Contain a Product of Mixtures?

A Theory of Cheap Control in Embodied Systems

Restricted Boltzmann Machines: Introduction and Review

Evaluating Morphological Computation in Muscle and DC-Motor Driven Models of Hopping Movements

Contact Info

Product

Resources

About