2012
DOI: 10.1613/jair.3761
|View full text |Cite
|
Sign up to set email alerts
|

Safe Exploration of State and Action Spaces in Reinforcement Learning

Abstract: In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some states may result in … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
232
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 192 publications
(232 citation statements)
references
References 47 publications
0
232
0
Order By: Relevance
“…These methods are typically combined with low-dimensional policy representations that are initialized on prior data, e. g., human demonstrations or trajectories optimized using an initial model [4], [5], [3]. Limiting policy updates between iterations lowers the probability of straying into unexplored state space, but does not provide any guarantees [22]. The associated risk is also evidenced by the results reported in Section III: if the search space includes obstacles, collisions can and, in general, will occur between the robot and obstacles.…”
Section: Related Workmentioning
confidence: 99%
“…These methods are typically combined with low-dimensional policy representations that are initialized on prior data, e. g., human demonstrations or trajectories optimized using an initial model [4], [5], [3]. Limiting policy updates between iterations lowers the probability of straying into unexplored state space, but does not provide any guarantees [22]. The associated risk is also evidenced by the results reported in Section III: if the search space includes obstacles, collisions can and, in general, will occur between the robot and obstacles.…”
Section: Related Workmentioning
confidence: 99%
“…While most Reinforcement Learning (RL) tasks [28] are focused on maximizing a long-term cumulative reward, RL researchers are also paying increasing attention to the safety of the approaches (e.g., avoiding visits to undesirable situations, collisions, crashes, etc.) during the training process [9,10]. Thus, when using RL techniques in dangerous control tasks, an important question arises; namely, how can we ensure that the exploration of the state-action space will not cause damage or injury while, at the same time, learning (near-)optimal policies?…”
Section: Introductionmentioning
confidence: 99%
“…Instead, evolutionary approaches [16,20,21] are usually based on evolving populations of neural networks (each one representing a different policy) with the aim of finding a better policy in each generation. However, the random generation of the initial population and the random mutation of the neural networks during evolution are made to visit undesirable states repeatedly [9]. Therefore, most of these approaches suffer from the same problem: the absence of mechanisms for avoiding the visit to undesirable situations during the exploration of the state and action spaces.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations