2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton) 2017
DOI: 10.1109/allerton.2017.8262843
|View full text |Cite
|
Sign up to set email alerts
|

Transition-based versus state-based reward functions for MDPs with Value-at-Risk

Abstract: In reinforcement learning, the reward function on current state and action is widely used. When the objective is about the expectation of the (discounted) total reward only, it works perfectly. However, if the objective involves the total reward distribution, the result will be wrong. This paper studies Value-at-Risk (VaR) problems in short-and longhorizon Markov decision processes (MDPs) with two reward functions, which share the same expectations. Firstly we show that with VaR objective, when the real reward… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2019
2019
2019
2019

Publication Types

Select...
1
1

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(7 citation statements)
references
References 26 publications
0
7
0
Order By: Relevance
“…Since many RL methods require the reward function to be deterministic and statebased, the transformation is needed for the MDPs with other types of reward functions in the risk-sensitive problems. We generalize the transformation (Ma and Yu 2017) in different settings, and consider VaR as an example to show the effect of reward simplification on distribution.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…Since many RL methods require the reward function to be deterministic and statebased, the transformation is needed for the MDPs with other types of reward functions in the risk-sensitive problems. We generalize the transformation (Ma and Yu 2017) in different settings, and consider VaR as an example to show the effect of reward simplification on distribution.…”
Section: Resultsmentioning
confidence: 99%
“…In RL, when the expected return is considered, and the Q-function or the value function is accessed, which implies such a reward simplification. The effect of the reward simplification on return distribution in a finite-horizon Markov reward process has been studied in (Ma and Yu 2017). Here we estimate the distribution with assuming it is approximately normal, illustrate the similar effect on return distribution, and generalize the transformation for more practical cases.…”
Section: Markov Decision Processesmentioning
confidence: 96%
See 2 more Smart Citations
“…In many cases, conditional VaR (also known as expected shortfall) is preferred over VaR since it is coherent [19], i.e., it has some intuitively reasonable properties (convexity, for example). However, when the return can be assumed to be approximately normally distributed, VaR can be simply estimated with E(Φ) and V(Φ) [20].…”
Section: Quantile-based Riskmentioning
confidence: 99%