2018 IEEE International Conference on Robotics and Automation (ICRA) 2018
DOI: 10.1109/icra.2018.8460547
|View full text |Cite
|
Sign up to set email alerts
|

OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World

Abstract: While deep reinforcement learning techniques have recently produced considerable achievements on many decision-making problems, their use in robotics has largely been limited to simulated worlds or restricted motions, since unconstrained trial-and-error interactions in the real world can have undesirable consequences for the robot or its environment. To overcome such limitations, we propose a novel reinforcement learning architecture, OptLayer, that takes as inputs possibly unsafe actions predicted by a neural… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
65
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 93 publications
(82 citation statements)
references
References 9 publications
0
65
0
Order By: Relevance
“…First, by improving the capability of the corrective controller, a larger safe region can be acquired, which reduces the conflicts between safety and learning performance. Second, safety can also be incorporated in the reward function of the learning algorithm [48]. By encouraging safe behaviors, the learning-based controller tends to stay within the safe region such that less guidance is needed from the supervisor.…”
Section: B Safety and Learning Performancementioning
confidence: 99%
“…First, by improving the capability of the corrective controller, a larger safe region can be acquired, which reduces the conflicts between safety and learning performance. Second, safety can also be incorporated in the reward function of the learning algorithm [48]. By encouraging safe behaviors, the learning-based controller tends to stay within the safe region such that less guidance is needed from the supervisor.…”
Section: B Safety and Learning Performancementioning
confidence: 99%
“…Recently, Pham, De Magistris, and Tachibana (2018) have suggested enforcing constraints by projecting any unconstrained point onto the constrained space by solving an optimisation program that minimises the L2 distance and backpropagating through it to train the network (Amos and Kolter 2017). This approach is very time consuming as it requires solving a quadratic program (QP) in the forward pass in every training iteration and, as a result, does not scale to problems with large dimensional action spaces (Amos and Kolter 2017) seen in practical screening problems.…”
Section: Constrained Action-space Rlmentioning
confidence: 99%
“…Our RL approach is similar in spirit to Bhatia, Varakantham, and Kumar (2019), which uses a complicated variablelength iterative approximation of the L2 projection to deal with a specific subset of linear constraints faster than Pham, De Magistris, and Tachibana (2018). The type of linear constraints they can handle are constraints on the sum of sets of variables, where these sets must form a hierarchy.…”
Section: Constrained Action-space Rlmentioning
confidence: 99%
See 1 more Smart Citation
“…In our study, we trained DRL controllers in a purely modelfree fashion, i.e., no specific knowledge on the data-center operation was preencoded in the neural networks or training process, instead using only abstract state and action vectors as inputs and outputs. Note that models can be used to accelerate training when available [10], and the quality of the resulting policy ultimately depends on the quality of the models. Domain adaptation and transfer learning [13,4] constitute important steps towards the application of such modelbased DRL techniques.…”
Section: Model-free Reinforcement Learningmentioning
confidence: 99%