Hilbert J. Kappen scite author profile

We address the role of noise and the issue of efficient computation in stochastic optimal control problems. We consider a class of nonlinear control problems that can be formulated as a path integral and where the noise plays the role of temperature. The path integral displays symmetry breaking and there exists a critical noise value that separates regimes where optimal control yields qualitatively different solutions. The path integral can be computed efficiently by Monte Carlo integration or by a Laplace approximation, and can therefore be used to solve high dimensional stochastic control problems.

show abstract

Path integrals and symmetry breaking for optimal control theory

Kappen¹

2005

J. Stat. Mech.

237

279

View full text Add to dashboard Cite

This paper considers linear-quadratic control of a non-linear dynamical system subject to arbitrary cost. I show that for this class of stochastic control problems the non-linear Hamilton-Jacobi-Bellman equation can be transformed into a linear equation. The transformation is similar to the transformation used to relate the classical Hamilton-Jacobi equation to the Schrödinger equation. As a result of the linearity, the usual backward computation can be replaced by a forward diffusion process, that can be computed by stochastic integration or by the evaluation of a path integral. It is shown, how in the deterministic limit the PMP formalism is recovered. The significance of the path integral approach is that it forms the basis for a number of efficient computational methods, such as MC sampling, the Laplace approximation and the variational approximation. We show the effectiveness of the first two methods in number of examples. Examples are given that show the qualitative difference between stochastic and deterministic control and the occurrence of symmetry breaking as a function of the noise.

show abstract

Optimal control as a graphical model inference problem

2012

View full text Add to dashboard Cite

We reformulate a class of non-linear stochastic optimal control problems introduced by Todorov (in Advances in Neural Information Processing Systems, vol. 19, pp. 1369Systems, vol. 19, pp. -1376Systems, vol. 19, pp. , 2007) as a Kullback-Leibler (KL) minimization problem. As a result, the optimal control computation reduces to an inference computation and approximate inference methods can be applied to efficiently compute approximate optimal controls. We show how this KL control theory contains the path integral control method as a special case. We provide an example of a block stacking task and a multi-agent cooperative game where we demonstrate how approximate inference can be successfully applied to instances that are too complex for exact computation. We discuss the relation of the KL control approach to other inference approaches to control.

show abstract

Efficient Learning in Boltzmann Machines Using Linear Response Theory

1998

View full text Add to dashboard Cite

El acceso a la versión del editor puede requerir la suscripción del recurso Access to the published version may require subscription AbstractThe learning process in Boltzmann Machines is computationally very expensive. The computational complexity of the exact algorithm is exponential in the number of neurons. We present a new approximate learning algorithm for Boltzmann Machines, which is based on mean eld theory and the linear response theorem. The computational complexity of the algorithm is cubic in the number of neurons.In the absence of hidden units, we show how the weights can be directly computed from the xed point equation of the learning rules. Thus, in this case we do not need to use a gradient descent procedure for the learning process. We show that the solutions of this method are close to the optimal solutions and give a signi cant improvement when correlations play a signi cant role. Finally, we apply the method to a pattern completion task and show good performance for networks up to 100 neurons.

show abstract

Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

2013

View full text Add to dashboard Cite

We consider the problems of learning the optimal action-value function and the optimal policy in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (RL) algorithms in the presence of a generative model of the MDP: value iteration and policy iteration. The first result indicates that for an MDP with N state-action pairs and the discount factor γ ∈ [0, 1) only O(N log(N/δ)/((1 − γ) 3 ε 2)) state-transition samples are required to find an ε-optimal estimation of the action-value function with the probability (w.p.) 1 − δ. Further, we prove that, for small values of ε, an order of O(N log(N/δ)/((1 − γ) 3 ε 2)) samples is required to find an ε-optimal policy w.p. 1 − δ. We also prove a matching lower bound of Θ(N log(N/δ)/((1 − γ) 3 ε 2)) on the sample complexity of estimating the optimal action-value function with ε accuracy. To the best of our knowledge, this is the first minimax result on the sample complexity of RL: the upper bounds match the lower bound in terms of N , ε, δ and 1/(1 − γ) up to a constant factor. Also, both our lower bound and upper bound improve on the state-of-the-art in terms of their dependence on 1/(1 − γ).

show abstract

Sufficient Conditions for Convergence of the Sum–Product Algorithm

Mooij

Kappen

2007

IEEE Trans. Inform. Theory

166

156

View full text Add to dashboard Cite

We derive novel conditions that guarantee convergence of the Sum-Product algorithm (also known as Loopy Belief Propagation or simply Belief Propagation) to a unique fixed point, irrespective of the initial messages. The computational complexity of the conditions is polynomial in the number of variables. In contrast with previously existing conditions, our results are directly applicable to arbitrary factor graphs (with discrete variables) and are shown to be valid also in the case of factors containing zeros, under some additional conditions. We compare our bounds with existing ones, numerically and, if possible, analytically. For binary variables with pairwise interactions, we derive sufficient conditions that take into account local evidence (i.e. single variable factors) and the type of pair interactions (attractive or repulsive). It is shown empirically that this bound outperforms existing bounds.

show abstract

Associative Memory with Dynamic Synapses

et al. 2002

View full text Add to dashboard Cite

We have examined a role of dynamic synapses in the stochastic Hop eldlike network behavior. Our results demonstrate an appearance of a novel phase characterized by quick transitions from one memory state to another. The network is able to retrieve memorized patterns corresponding to classical ferromagnetic states but switches between memorized patterns with an intermittent type of behavior. This phenomeno n might reect the exibility of real neural systems and their readiness to receive and respond to novel and changing external stimuli.

show abstract

An introduction to stochastic control theory, path integrals and reinforcement learning

Kappen

2007

116

View full text Add to dashboard Cite

Abstract. Control theory is a mathematical description of how to act optimally to gain future rewards. In this paper I give an introduction to deterministic and stochastic control theory and I give an overview of the possible application of control theory to the modeling of animal behavior and learning. I discuss a class of non-linear stochastic control problems that can be efficiently solved using a path integral or by MC sampling. In this control formalism the central concept of cost-to-go becomes a free energy and methods and concepts from statistical physics can be readily applied.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hilbert J. Kappen

Linear Theory for Control of Nonlinear Stochastic Systems

Path integrals and symmetry breaking for optimal control theory

Optimal control as a graphical model inference problem

Efficient Learning in Boltzmann Machines Using Linear Response Theory

Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

Sufficient Conditions for Convergence of the Sum–Product Algorithm

Associative Memory with Dynamic Synapses

An introduction to stochastic control theory, path integrals and reinforcement learning

Contact Info

Product

Resources

About