2019
DOI: 10.1609/aaai.v33i01.33014512
|View full text |Cite
|
Sign up to set email alerts
|

State-Augmentation Transformations for Risk-Sensitive Reinforcement Learning

Abstract: In the framework of MDP, although the general reward function takes three arguments-current state, action, and successor state; it is often simplified to a function of two arguments-current state and action. The former is called a transition-based reward function, whereas the latter is called a state-based reward function. When the objective involves the expected total reward only, this simplification works perfectly. However, when the objective is risk-sensitive, this simplification leads to an incorrect valu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
1
1
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(7 citation statements)
references
References 11 publications
0
7
0
Order By: Relevance
“…A commonly used quantile-based risk measure is value at risk (VaR). For VaR estimation with the SAT, see [7]. In short, we claim that most, if not all, inherent risk measures depend on the reward sequence (R t : t ∈ {1, · · · , N }), which can be preserved by the SAT in a risksensitive scenario.…”
Section: Arxiv:190705231v1 [Cslg] 9 Jul 2019mentioning
confidence: 99%
See 4 more Smart Citations
“…A commonly used quantile-based risk measure is value at risk (VaR). For VaR estimation with the SAT, see [7]. In short, we claim that most, if not all, inherent risk measures depend on the reward sequence (R t : t ∈ {1, · · · , N }), which can be preserved by the SAT in a risksensitive scenario.…”
Section: Arxiv:190705231v1 [Cslg] 9 Jul 2019mentioning
confidence: 99%
“…In short, we claim that most, if not all, inherent risk measures depend on the reward sequence (R t : t ∈ {1, · · · , N }), which can be preserved by the SAT in a risksensitive scenario. For law-invariant risk estimations, since VaR estimation in a Markov process with a stochastic reward has been throughly studied in [7], in this paper, we focus on the first two risk estimations with Eq. ???…”
Section: Arxiv:190705231v1 [Cslg] 9 Jul 2019mentioning
confidence: 99%
See 3 more Smart Citations