2023
DOI: 10.1137/22m1527209
|View full text |Cite
|
Sign up to set email alerts
|

Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning

Anthony Coache,
Sebastian Jaimungal,
Álvaro Cartea

Abstract: We propose a novel framework to solve risk-sensitive reinforcement learning problems where the agent optimizes time-consistent dynamic spectral risk measures. Based on the notion of conditional elicitability, our methodology constructs (strictly consistent) scoring functions that are used as penalizers in the estimation procedure. Our contribution is threefold: we (i) devise an efficient approach to estimate a class of dynamic spectral risk measures with deep neural networks, (ii) prove that these dynamic spec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 62 publications
0
0
0
Order By: Relevance
“…, while in tandem, we will also update the quadratic penalty coefficient . For ease of understanding, the block diagram of the workings of the risk-averse actor-critic algorithm [12] with the proposed augmented lagrangian-based constraint handling mechanism is shown in Figure 2. Here, the upper shaded religion contains all the learning components inside the actor-critic algorithml while the shaded region at the bottom contains the proposed constraint handling mechanism, which updates the Lagrangian multipliers and quadratic penalty factors (inside the environment) after every fixed number of risk-averse actor-critic algorithm's training iterations.…”
Section: Developed Mechanismmentioning
confidence: 99%
See 4 more Smart Citations
“…, while in tandem, we will also update the quadratic penalty coefficient . For ease of understanding, the block diagram of the workings of the risk-averse actor-critic algorithm [12] with the proposed augmented lagrangian-based constraint handling mechanism is shown in Figure 2. Here, the upper shaded religion contains all the learning components inside the actor-critic algorithml while the shaded region at the bottom contains the proposed constraint handling mechanism, which updates the Lagrangian multipliers and quadratic penalty factors (inside the environment) after every fixed number of risk-averse actor-critic algorithm's training iterations.…”
Section: Developed Mechanismmentioning
confidence: 99%
“…Here, the upper shaded religion contains all the learning components inside the actor-critic algorithml while the shaded region at the bottom contains the proposed constraint handling mechanism, which updates the Lagrangian multipliers and quadratic penalty factors (inside the environment) after every fixed number of risk-averse actor-critic algorithm's training iterations. For elucidation on conditionally elicitable dynamic risk measure and scoring function mentioned in the following diagram, readers are referred to [12].…”
Section: Developed Mechanismmentioning
confidence: 99%
See 3 more Smart Citations