2023
DOI: 10.1016/j.neucom.2023.02.049
|View full text |Cite
|
Sign up to set email alerts
|

Distributional reinforcement learning with unconstrained monotonic neural networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 13 publications
0
0
0
Order By: Relevance
“…Indeed, it ensures the accurate estimation of the risk, as defined in Section 3.2.2. This observation is in line with the findings of the research paper [27], introducing the UMDQN algorithm, and suggests that the solution introduced to achieve risk-sensitivity does not significantly alter the properties of the original distributional RL algorithm. Secondly, as previously explained, Figure 6 highlights the relevance of each function introduced (Q π , R π and U π ) for making and motivating a decision.…”
Section: Probability Distribution Visualisationsupporting
confidence: 88%
See 3 more Smart Citations
“…Indeed, it ensures the accurate estimation of the risk, as defined in Section 3.2.2. This observation is in line with the findings of the research paper [27], introducing the UMDQN algorithm, and suggests that the solution introduced to achieve risk-sensitivity does not significantly alter the properties of the original distributional RL algorithm. Secondly, as previously explained, Figure 6 highlights the relevance of each function introduced (Q π , R π and U π ) for making and motivating a decision.…”
Section: Probability Distribution Visualisationsupporting
confidence: 88%
“…The distributional RL algorithm selected to assess the soundness of the methodology introduced to learn risk-sensitive decision-making policies is the Unconstrained Monotonic Deep Q-Network with Cramer (UMDQN-C) [27]. Basically, this particular distributional RL algorithm models the CDF of the random return in a continuous way by taking advantage of the Cramer distance to derive the TD-error.…”
Section: Risk-sensitive Distributional Rl Algorithm Analysedmentioning
confidence: 99%
See 2 more Smart Citations