Robotics: Science and Systems XVII 2021
DOI: 10.15607/rss.2021.xvii.007
|View full text |Cite
|
Sign up to set email alerts
|

Robust Value Iteration for Continuous Control Tasks

Abstract: When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. Commonly, the optimal policy overfits to the approximate model and the corresponding statedistribution, often resulting in failure to trasnfer underlying distributional shifts. In this paper, we present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain and incorporates adversarial pertu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 36 publications
0
5
0
Order By: Relevance
“…Their sim-to-sim evaluation on four MuJoCo tasks showed that agents trained with the suggested adversarial randomization generalize slightly better to domain parameter configurations than agents trained with a static randomization scheme. Lutter et al (2021a) derived the optimal policy together with different optimal disturbances from the value function in a continuous state, action, and time RL setting. Despite outstanding sim-to-real transferability of the resulting policies, the presented approach is conceptually restricted by assuming access to a compact representation of the state domain, typically obtained through exhaustive sampling, which hinders the scalability to high-dimensional tasks.…”
Section: Domain Randomization For Sim-to-real Transfermentioning
confidence: 99%
“…Their sim-to-sim evaluation on four MuJoCo tasks showed that agents trained with the suggested adversarial randomization generalize slightly better to domain parameter configurations than agents trained with a static randomization scheme. Lutter et al (2021a) derived the optimal policy together with different optimal disturbances from the value function in a continuous state, action, and time RL setting. Despite outstanding sim-to-real transferability of the resulting policies, the presented approach is conceptually restricted by assuming access to a compact representation of the state domain, typically obtained through exhaustive sampling, which hinders the scalability to high-dimensional tasks.…”
Section: Domain Randomization For Sim-to-real Transfermentioning
confidence: 99%
“…This is important because the location component of system states in our DP framework are continuous variables. For continuous-state DP problems, two common approaches for solving the problem numerically are either to approximate the optimum cost functions (e.g., by using least-squares regression or neural-networks) or to discretize the state space [40]- [42].…”
Section: Implementation and Complexity Analysismentioning
confidence: 99%
“…The third technique, neural-fitted value function for policy iteration (N-FVPI) , represents a class of value-based RL methods, where a neural network is used to represent the value function v π ( s ) to handle the continuous state space (Heess et al, 2015). During the policy evaluation step, the value function’s parameters are optimized to reduce the one-step squared Bellman residual via gradient descent (Lutter et al, 2021). Like the previous approaches, the policy is implicitly derived by selecting an action that maximizes the Bellman equation in equation (3).…”
Section: Algorithmic Performance Evaluationmentioning
confidence: 99%