Bouchra El Akraoui scite author profile

Conventional algorithms for solving Markov decision processes (MDPs) become intractable for a large finite state and action spaces. Several studies have been devoted to this issue, but most of them only treat infinite-horizon MDPs. This paper is one of the first works to deal with non-stationary finite-horizon MDPs by proposing a new decomposition approach, which consists in partitioning the problem into smaller restricted finite-horizon MDPs, each restricted MDP is solved independently, in a specific order, using the proposed hierarchical backward induction (HBI) algorithm based on the backward induction (BI) algorithm. Next, the sub-local solutions are combined to obtain a global solution. An example of racetrack problems shows the performance of the proposal decomposition technique.

show abstract

Solving Finite-Horizon Discounted Non-Stationary MDPS

Akraoui

Daoui

2023

View full text Add to dashboard Cite

Research background Markov Decision Processes (MDPs) are a powerful framework for modeling many real-world problems with finite-horizons that maximize the reward given a sequence of actions. Although many problems such as investment and financial market problems where the value of a reward decreases exponentially with time, require the introduction of interest rates. Purpose This study investigates non-stationary finite-horizon MDPs with a discount factor to account for fluctuations in rewards over time. Research methodology To consider the fluctuations of rewards with time, the authors define new nonstationary finite-horizon MDPs with a discount factor. First, the existence of an optimal policy for the proposed finite-horizon discounted MDPs is proven. Next, a new Discounted Backward Induction (DBI) algorithm is presented to find it. To enhance the value of their proposal, a financial model is used as an example of a finite-horizon discounted MDP and an adaptive DBI algorithm is used to solve it. Results The proposed method calculates the optimal values of the investment to maximize its expected total return with consideration of the time value of money. Novelty No existing studies have before examined dynamic finite-horizon problems that account for temporal fluctuations in rewards.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Bouchra El Akraoui

Optimization Focused on Parallel Fuzzy Deep Belief Neural Network for Opinion Mining

Deep Reinforcement Learning for Bitcoin Trading

Deep Learning for Medical Image Segmentation

Decomposition Methods for Solving Finite‐Horizon Large MDPs

Solving Finite-Horizon Discounted Non-Stationary MDPS

Contact Info

Product

Resources

About