2020 20th International Conference on Control, Automation and Systems (ICCAS) 2020
DOI: 10.23919/iccas50221.2020.9268413
|View full text |Cite
|
Sign up to set email alerts
|

Mixed Reinforcement Learning for Efficient Policy Optimization in Stochastic Environments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1
1

Relationship

4
4

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…For the minimum return to work in the task, we first choose a tolerable value of each state during an episode, i.e., x = 2, θ = 0.1, ẋ = 0.1, θ = 0.05, to compute a tolerable reward using (18). Then the minimum return equals the reward times the episode length 100.…”
Section: B Inverted Pendulum Taskmentioning
confidence: 99%
See 1 more Smart Citation

Mixed Policy Gradient

Guan,
Duan,
Li
et al. 2021
Preprint
Self Cite
“…For the minimum return to work in the task, we first choose a tolerable value of each state during an episode, i.e., x = 2, θ = 0.1, ẋ = 0.1, θ = 0.05, to compute a tolerable reward using (18). Then the minimum return equals the reward times the episode length 100.…”
Section: B Inverted Pendulum Taskmentioning
confidence: 99%
“…To address this problem, Parmas et al (2019) developed a total propagation algorithm that introduces an additional PG estimator with the likelihood ratio trick and automatically gives greater weight to estimators with lower variance [17]. The second drawback is that the PG from BPTT is sensitive to model errors, especially for a long prediction-horizon [18]. This problem has also been shown in [16], in which a value function is integrated after certain forward steps driven by the model to increase robustness to model error and extend BPTT to infinite-horizon control.…”
Section: Introductionmentioning
confidence: 99%

Mixed Policy Gradient

Guan,
Duan,
Li
et al. 2021
Preprint
Self Cite
“…As a self-learning method, RL is promising to reduce the massive engineering efforts in autonomous driving. In recent years, there has been a growing interest towards RL in autonomous driving community, such as adaptive cruise control [6], lane-keeping [7], trajectory tracking [8] and multi-vehicle cooperation [9]. However, despite achieving decent performance, these RL methods mostly lack explicit safety constraints, which significantly limits their application in safety-critical autonomous driving.…”
Section: Introductionmentioning
confidence: 99%
“…With the development of artificial intelligence technologies, autonomous driving has become an important tendency in the automotive industry for its potential to improve road safety, reduce fuel consumption, and improve traffic efficiency [1], [2]. There are two kinds of schemes widely employed by the autonomous driving system: hierarchical scheme [3], [4], [5], [6] and end-to-end scheme [7], [8], [9]. The hierarchical scheme divides the entire autonomous driving system into several modules, including environment perception, decision-making, and motion control [3], [4], [10].…”
Section: Introductionmentioning
confidence: 99%