Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence 2020
DOI: 10.24963/ijcai.2020/632
|View full text |Cite
|
Sign up to set email alerts
|

Risk-Averse Trust Region Optimization for Reward-Volatility Reduction

Abstract: The use of reinforcement learning in algorithmic trading is of growing interest, since it offers the opportunity of making profit through the development of autonomous artificial traders, that do not depend on hard-coded rules. In such a framework, keeping uncertainty under control is as important as maximizing expected returns. Risk aversion has been addressed in reinforcement learning through measures related to the distribution of returns. However, in trading it is essential to keep under control th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
36
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(36 citation statements)
references
References 8 publications
(13 reference statements)
0
36
0
Order By: Relevance
“…This advancement was also noticed by researchers of the financial derivatives area, where several publications reported on promising hedging performance of trained RL agents, see e.g. [25,32,23,15,13,14,8,51]. In terms of the choice of training algorithm, the research focus has lain on variations of Deep Q-Learning (DQN) [37], which combines Q-learning [52] with a deep neural network for policy representation.…”
Section: Introductionmentioning
confidence: 87%
See 1 more Smart Citation
“…This advancement was also noticed by researchers of the financial derivatives area, where several publications reported on promising hedging performance of trained RL agents, see e.g. [25,32,23,15,13,14,8,51]. In terms of the choice of training algorithm, the research focus has lain on variations of Deep Q-Learning (DQN) [37], which combines Q-learning [52] with a deep neural network for policy representation.…”
Section: Introductionmentioning
confidence: 87%
“…The article [32] assumes that derivative prices are known and studies a DQN approach for hedging in the presence of transaction costs, focusing on mean-variance equivalent loss distributions. Article [51] studies risk-averse policy search for hedging when price information is available following the mean-volatility approach of [8].…”
Section: Introductionmentioning
confidence: 99%
“…A variety of algorithms for variance-related optimization in previous works can be unified and analyzed in the proposed framework. When the equivalence in ( 18) is concerned, we have the policy gradient for the mean-variance optimizations in discounted MDPs (Bisi et al 2020), and the policy iteration for the mean-variance optimization in discounted and average MDPs (Zhang et al 2021). However, no convergence analysis is given in either of the works, such as the analyses for the policy iterations in (Xia 2016(Xia , 2020.…”
Section: Algorithm 2 Policy Iteration Variants For Inner Optimization...mentioning
confidence: 99%
“…This work is later extended to the mean-variance optimization in average MDPs (Xia 2020). Bisi et al (2020) study the discounted mean-variance in RL, where the steady-state variance is evaluated to bound the limiting average variance. They develop a gradient-based trust region policy optimization (originally proposed by Schulman et al (2015)) algorithm with a monotonic policy improvement.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation