2022
DOI: 10.3390/electronics11071141
|View full text |Cite
|
Sign up to set email alerts
|

An Efficient Simulation-Based Policy Improvement with Optimal Computing Budget Allocation Based on Accumulated Samples

Abstract: Markov decision processes (MDPs) are widely used to model stochastic systems to deduce optimal decision-making policies. As the transition probabilities are usually unknown in MDPs, simulation-based policy improvement (SBPI) using a base policy to derive optimal policies when the state transition probabilities are unknown is suggested. However, estimating the Q-value of each action to determine the best action in each state requires many simulations, which results in efficiency problems for SBPI. In this study… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 16 publications
0
0
0
Order By: Relevance
“…Ref. [4] A simulation-based policy improvement (SBPI) scheme was proposed to obtain the optimal policies using Markov decision processes (MDPs), where the state transition probabilities are unknown. In particular, a new method was introduced for improving the overall efficiency of SBPI by using optimal computing budget allocation (OCBA) methods based on accumulated samples.…”
Section: The Present Issuementioning
confidence: 99%
“…Ref. [4] A simulation-based policy improvement (SBPI) scheme was proposed to obtain the optimal policies using Markov decision processes (MDPs), where the state transition probabilities are unknown. In particular, a new method was introduced for improving the overall efficiency of SBPI by using optimal computing budget allocation (OCBA) methods based on accumulated samples.…”
Section: The Present Issuementioning
confidence: 99%