2014 IEEE International Conference on Robotics and Automation (ICRA) 2014
DOI: 10.1109/icra.2014.6907332
|View full text |Cite
|
Sign up to set email alerts
|

Sample path sharing in simulation-based policy improvement

Abstract: Simulation-based policy improvement (SBPI) has been widely used to improve given base policies through simulation. The basic idea of SBPI is to estimate all the Qfactors for a given state using simulation, and then select the action that achieves the minimal cost. It is therefore of great importance to efficiently use the given budget in order to select the best action with high probability. Different from existing budget allocation algorithms that estimate Q-factors by independent simulation, we share the sam… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
0
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 43 publications
(24 reference statements)
0
0
0
Order By: Relevance
“…Herein, we compare our method with five existing methods (EA, OCBAPI [14], OCBA-S [15], EA-sample accumulation (SA), and OCBAPI-SA) using two MDP models, namely, a two-state example and its extended version. The extended version is used to verify the effectiveness and efficiency of the proposed method in a more complex manner.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…Herein, we compare our method with five existing methods (EA, OCBAPI [14], OCBA-S [15], EA-sample accumulation (SA), and OCBAPI-SA) using two MDP models, namely, a two-state example and its extended version. The extended version is used to verify the effectiveness and efficiency of the proposed method in a more complex manner.…”
Section: Methodsmentioning
confidence: 99%
“…a Sample path sharing is used to calculate QT,S π (s, a); see more details in [15]. The two-state example was expanded by increasing the available actions in state s 2 , as shown in Figure 3.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations