Robust Value Iteration for Continuous Control Tasks

Lutter, Michael; Mannor, Shie; Peters, Jan; Fox, Dieter; Garg, Animesh

doi:10.15607/rss.2021.xvii.007

Cited by 12 publications

(7 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Their sim-to-sim evaluation on four MuJoCo tasks showed that agents trained with the suggested adversarial randomization generalize slightly better to domain parameter configurations than agents trained with a static randomization scheme. Lutter et al (2021a) derived the optimal policy together with different optimal disturbances from the value function in a continuous state, action, and time RL setting. Despite outstanding sim-to-real transferability of the resulting policies, the presented approach is conceptually restricted by assuming access to a compact representation of the state domain, typically obtained through exhaustive sampling, which hinders the scalability to high-dimensional tasks.…”

Section: Domain Randomization For Sim-to-real Transfermentioning

confidence: 99%

Robot Learning From Randomized Simulations: A Review

et al. 2022

Self Cite

View full text Add to dashboard Cite

The rise of deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data. Unfortunately, it is prohibitively expensive to generate such data sets on a physical platform. Therefore, state-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive and subsequently transfer the knowledge to the real robot (sim-to-real). Despite becoming increasingly realistic, all simulators are by construction based on models, hence inevitably imperfect. This raises the question of how simulators can be modified to facilitate learning robot control policies and overcome the mismatch between simulation and reality, often called the “reality gap.” We provide a comprehensive review of sim-to-real research for robotics, focusing on a technique named “domain randomization” which is a method for learning from randomized simulations.

show abstract

Section: Domain Randomization For Sim-to-real Transfermentioning

confidence: 99%

Robot Learning From Randomized Simulations: A Review

et al. 2022

Self Cite

View full text Add to dashboard Cite

show abstract

“…This is important because the location component of system states in our DP framework are continuous variables. For continuous-state DP problems, two common approaches for solving the problem numerically are either to approximate the optimum cost functions (e.g., by using least-squares regression or neural-networks) or to discretize the state space [40]- [42].…”

Section: Implementation and Complexity Analysismentioning

confidence: 99%

Optimum UAV Trajectory Design for Data Harvesting From Distributed Nodes

Kudathanthirige,

Inaltekin,

Hanly

et al. 2024

IEEE Trans. Commun.

View full text Add to dashboard Cite

This paper designs energy-efficient trajectories for unmanned aerial vehicles (UAVs) harvesting data sequentially from distributed ground nodes. We propose a novel optimization framework for path planning, based on dynamic programming. We develop an optimum backward-forward algorithm that jointly optimizes the hovering locations for each ground node, and the visiting order to those locations. Our algorithm minimizes the total energy consumption of the UAV over its trajectory. Our framework is compatible with various probabilistic wireless communication channel models, and can also be applied to different cost functions, including minimising the total flying time, and allowing for bi-directional communications. We also develop a lower complexity algorithm that approximates the optimum UAV trajectory by decomposing the original problem into two sub-problems, and iterating back and forth between the two. This alternating algorithm has polynomial time complexity, and we show that it produces a near-optimum UAV trajectory, with as little deviation as 5% to 15% from the average energy consumption of the optimum algorithm.

show abstract

“…The third technique, neural-fitted value function for policy iteration (N-FVPI) , represents a class of value-based RL methods, where a neural network is used to represent the value function v π ( s ) to handle the continuous state space (Heess et al, 2015). During the policy evaluation step, the value function’s parameters are optimized to reduce the one-step squared Bellman residual via gradient descent (Lutter et al, 2021). Like the previous approaches, the policy is implicitly derived by selecting an action that maximizes the Bellman equation in equation (3).…”

Section: Algorithmic Performance Evaluationmentioning

confidence: 99%

Kernel-based diffusion approximated Markov decision processes for autonomous navigation and control on unstructured terrains

Xu,

Yin,

Chen

et al. 2024

The International Journal of Robotics Research

View full text Add to dashboard Cite

We propose a diffusion approximation method to the continuous-state Markov decision processes that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most decision-theoretic planning frameworks that assume fully known state transition models, we design a method that eliminates such a strong assumption that is often extremely difficult to engineer in reality. We first take the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of the value function, we design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We first validate the proposed method through extensive simulations in 2 D obstacle avoidance and 2.5 D terrain navigation problems. The results show that the proposed approach leads to a much superior performance over several baselines. We then develop a system that integrates our decision-making framework with onboard perception and conduct real-world experiments in both cluttered indoor and unstructured outdoor environments. The results from the physical systems further demonstrate the applicability of our method in challenging real-world environments.

show abstract

Robust Value Iteration for Continuous Control Tasks

Cited by 12 publications

References 36 publications

Robot Learning From Randomized Simulations: A Review

Robot Learning From Randomized Simulations: A Review

Optimum UAV Trajectory Design for Data Harvesting From Distributed Nodes

Kernel-based diffusion approximated Markov decision processes for autonomous navigation and control on unstructured terrains

Contact Info

Product

Resources

About