2019 # State-Augmentation Transformations for Risk-Sensitive Reinforcement Learning

**Abstract:** In the framework of MDP, although the general reward function takes three arguments-current state, action, and successor state; it is often simplified to a function of two arguments-current state and action. The former is called a transition-based reward function, whereas the latter is called a state-based reward function. When the objective involves the expected total reward only, this simplification works perfectly. However, when the objective is risk-sensitive, this simplification leads to an incorrect valu…

Help me understand this report

Search citation statements

Paper Sections

Select...

2

2

1

Citation Types

0

7

0

Year Published

2019

2022

Publication Types

Select...

1

1

1

Relationship

1

2

Authors

Journals

(7 citation statements)

0

7

0

“…A commonly used quantile-based risk measure is value at risk (VaR). For VaR estimation with the SAT, see [7]. In short, we claim that most, if not all, inherent risk measures depend on the reward sequence (R t : t ∈ {1, · · · , N }), which can be preserved by the SAT in a risksensitive scenario.…”

confidence: 99%

“…A commonly used quantile-based risk measure is value at risk (VaR). For VaR estimation with the SAT, see [7]. In short, we claim that most, if not all, inherent risk measures depend on the reward sequence (R t : t ∈ {1, · · · , N }), which can be preserved by the SAT in a risksensitive scenario.…”

confidence: 99%

“…In short, we claim that most, if not all, inherent risk measures depend on the reward sequence (R t : t ∈ {1, · · · , N }), which can be preserved by the SAT in a risksensitive scenario. For law-invariant risk estimations, since VaR estimation in a Markov process with a stochastic reward has been throughly studied in [7], in this paper, we focus on the first two risk estimations with Eq. ???…”

confidence: 99%