2019
DOI: 10.1109/access.2019.2952651
|View full text |Cite
|
Sign up to set email alerts
|

Safe Q-Learning Method Based on Constrained Markov Decision Processes

Abstract: The application of reinforcement learning in industrial fields makes the safety problem of the agent a research hotspot. Traditional methods mainly alter the objective function and the exploration process of the agent to address the safety problem. Those methods, however, can hardly prevent the agent from falling into dangerous states because most of the methods ignore the damage caused by unsafe states. As a result, most solutions are not satisfactory. In order to solve the aforementioned problem, we come for… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 24 publications
(10 citation statements)
references
References 30 publications
0
10
0
Order By: Relevance
“…This primarily involves the consideration of safety-relevant parameters to avoid critical actions and threats. In this context, a constraint-driven approach in non-deep RL was proposed by Ge et al (2019), in which permitted actions were limited through preliminary filtering, or by Xiong and Diao (2021) who proposed a safety-based evaluation of policy robustness. Further studies should approximate the simulations and frameworks to real-world conditions even more, which includes consideration of hard real-time requirements, significant parameters, uncertainties, and indeterminacies.…”
Section: Future Research Agendamentioning
confidence: 99%
“…This primarily involves the consideration of safety-relevant parameters to avoid critical actions and threats. In this context, a constraint-driven approach in non-deep RL was proposed by Ge et al (2019), in which permitted actions were limited through preliminary filtering, or by Xiong and Diao (2021) who proposed a safety-based evaluation of policy robustness. Further studies should approximate the simulations and frameworks to real-world conditions even more, which includes consideration of hard real-time requirements, significant parameters, uncertainties, and indeterminacies.…”
Section: Future Research Agendamentioning
confidence: 99%
“…There is no guarantee that an EV has a fully-charged battery at its time of departure set in a specific charging schedule because of the random behavior of the EVs in arrival and departure times. The authors in [100] address this issue by considering the EV problem as a constrained Markov Decision Process (CMDP) and solves the problem with constrained policy optimization (CPO) [122]. They propose a real-time strategy for charging and discharging the EVs that tackles the random behavior of EVs in arrival and departure, prices of electricity, and battery remnant energy.…”
Section: Real-time Chargingmentioning
confidence: 99%
“…In most cases, a stochastic (e.g., random) policy is selected as the behaviour policy to ensure enough exploration of new states. One of the most practiced off-policy methods is known as Q-learning [41], [42], [44], [46], which updates the value function using the Bellman optimality equation as follows…”
Section: B Off-policy Td Learningmentioning
confidence: 99%