A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot

Martínez-Cantín, Rubén; Freitas, Nando de; Brochu, Eric; Castellanos, José A.; Doucet, Arnaud

doi:10.1007/s10514-009-9130-2

Cited by 187 publications

(142 citation statements)

References 40 publications

Supporting

Mentioning

141

Contrasting

Unclassified

Order By: Relevance

“…For the same sampling budget, Kriging interpolation is far more sharply with no additional parameter to tune. For each action a t , the expected return could be estimated by averaging the return of a set of trajectories generated according to the pose uncertainty distribution, as in [7]. The return of a single trajectory is considered here, since no pose uncertainty is assumed during the learning stage.…”

Section: Resultsmentioning

confidence: 99%

“…The objective is to find the best approximation of the action-value function within a small sampling budget. In [7], Kriging and Bayesian optimization have been used to address a simultaneous localization and mapping problem under time and energy constraints. In the present paper, similar strategies are investigated to tackle the problem of viewpoint planning for active recognition.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning Viewpoint Planning in Active Recognition on a Small Sampling Budget: A Kriging Approach

Defretin

Marzat

Piet-Lahanier

2010

2010 Ninth International Conference on Machine Learning and Applications

View full text Add to dashboard Cite

Abstract-This paper focuses on viewpoint planning for 3D active object recognition. The objective is to design a planning policy into a Q-learning framework with a limited number of samples. Most existing stochastic techniques are therefore inapplicable. We propose to use Kriging and Bayesian Optimization coupled with Q-learning to obtain a computationally-efficient viewpoint-planning design, under a restrictive sampling budget. Experimental results on a representative database, including a comparison with classical approaches, show promising results for this strategy.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Learning Viewpoint Planning in Active Recognition on a Small Sampling Budget: A Kriging Approach

Defretin

Marzat

Piet-Lahanier

2010

2010 Ninth International Conference on Machine Learning and Applications

View full text Add to dashboard Cite

show abstract

“…In [40], the authors use this kind of technique to optimize motion of a robot in order to minimize uncertainty of localization. They predict the information gain and the cost using the prior and so dynamically solve the exploration/exploitation dilemma in the context of path planning.…”

Section: Active Exploration While Learningmentioning

confidence: 99%

Learning the behavior model of a robot

2010

View full text Add to dashboard Cite

Complex artifacts are designed today from well specified and well modeled components. But most often, the models of these components cannot be composed into a global functional model of the artifact. A significant observation, modeling and identification effort is required to get such a global model, which is needed in order to better understand, control and improve the designed artifact.Robotics provides a good illustration of this need. Autonomous robots are able to achieve more and more complex tasks, relying on more advanced sensori-motor functions. To better understand their behavior and improve their performance, it becomes necessary but more difficult to characterize and to model, at the global level, how robots behave in a given environment. Low-level models of sensors, actuators and controllers cannot be easily combined into a behavior model. Sometimes high level models operators used for planning are also available, but generally they are too coarse to represent the actual robot behavior.We propose here a general framework for learning from observation data the behavior model of a robot when performing a given task. The behavior is modeled as a Dynamic Bayesian Network, a convenient stochastic structured representations. We show how such a probabilistic model can be learned and how it can be used to improve, on line, the robot behavior with respect to a specific environment and user preferences. Framework and algorithms are detailed; they are substantiated by experimental results for autonomous navigation tasks.

show abstract

“…Reinforcement Learning (RL) is a notable paradigm for robot control with several reported successes in recent years (Tedrake et al, 2005;Abbeel et al, 2007;Kober and Peters, 2009;Riedmiller et al, 2009;Martinez-Cantin et al, 2009). In this paper we address the problem of model-free RL, in which the goal is to learn a controller without knowledge of the dynamics of the system (Bertsekas and Tsitsiklis, 1996;Sutton and Barto, 1998).…”

Section: Introductionmentioning

confidence: 99%

Learning model-free robot control by a Monte Carlo EM algorithm

et al. 2009

View full text Add to dashboard Cite

We address the problem of learning robot control by model-free reinforcement learning (RL). We adopt the probabilistic model of Vlassis and Toussaint (2009) for model-free RL, and we propose a Monte Carlo EM algorithm (MCEM) for control learning that searches directly in the space of controller parameters using information obtained from randomly generated robot trajectories. MCEM is related to, and generalizes, the PoWER algorithm of Kober and Peters (2009). In the finite-horizon case MCEM reduces precisely to PoWER, but MCEM can also handle the discounted infinite-horizon case. An interesting result is that the infinite-horizon case can be viewed as a 'randomized' version of the finite-horizon case, in the sense that the length of each sampled trajectory is a random draw from an appropriately constructed geometric distribution. We provide some preliminary experiments demonstrating the effects of fixed (PoWER) vs randomized (MCEM) horizon length in two simulated and one real robot control tasks.

show abstract

A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot

Cited by 187 publications

References 40 publications

Learning Viewpoint Planning in Active Recognition on a Small Sampling Budget: A Kriging Approach

Learning Viewpoint Planning in Active Recognition on a Small Sampling Budget: A Kriging Approach

Learning the behavior model of a robot

Learning model-free robot control by a Monte Carlo EM algorithm

Contact Info

Product

Resources

About