2023
DOI: 10.1111/mafi.12382
|View full text |Cite
|
Sign up to set email alerts
|

Recent advances in reinforcement learning in finance

Abstract: The rapid changes in the finance industry due to the increasing amount of data have revolutionized the techniques on data processing and data analysis and brought new theoretical and computational challenges. In contrast to classical stochastic control theory and other analytical approaches for solving financial decision‐making problems that heavily reply on model assumptions, new developments from reinforcement learning (RL) are able to make full use of the large amount of financial data with fewer model assu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(13 citation statements)
references
References 248 publications
(565 reference statements)
0
10
0
Order By: Relevance
“…SARSA is an example of a value-based RL method, as the aim of SARSA is to find the optimal Q-function by directly evaluating which state-action pairs are more promising in terms of expected cumulative future rewards [9]. Value methods such as SARSA do not explicitly consider the optimal policy [12]. Conversely, policy-based methods keep the current estimate of the optimal policy in memory during the learning process, and updates are performed on this optimal policy estimate rather than the value function [12].…”
Section: Reinforcement Learning 21 Fundamentalsmentioning
confidence: 99%
See 2 more Smart Citations
“…SARSA is an example of a value-based RL method, as the aim of SARSA is to find the optimal Q-function by directly evaluating which state-action pairs are more promising in terms of expected cumulative future rewards [9]. Value methods such as SARSA do not explicitly consider the optimal policy [12]. Conversely, policy-based methods keep the current estimate of the optimal policy in memory during the learning process, and updates are performed on this optimal policy estimate rather than the value function [12].…”
Section: Reinforcement Learning 21 Fundamentalsmentioning
confidence: 99%
“…Value methods such as SARSA do not explicitly consider the optimal policy [12]. Conversely, policy-based methods keep the current estimate of the optimal policy in memory during the learning process, and updates are performed on this optimal policy estimate rather than the value function [12]. Using a value method rather than the policy method may impose the constraint of discrete action spaces.…”
Section: Reinforcement Learning 21 Fundamentalsmentioning
confidence: 99%
See 1 more Smart Citation
“…In practice, market makers interact with client order flow and learn to adjust their quotes in order to maximize their profit. We model this learning process using a decentralized multi-agent deep reinforcement learning algorithm (Hambly et al, 2023) using a policy gradient method (Fazel et al, 2018) to update market makers strategies, parameterized via neural networks. Our simulation results show that the interaction of market making algorithms through market prices, without any sharing of information, may give rise to tacit collusion, as evidenced by quoted spread levels significantly higher than in competitive (Nash) equilibrium.…”
Section: Introductionmentioning
confidence: 99%
“…The Markov decision problem leads to an infinite horizon stochastic optimal control problem in discrete‐time, which finds many applications in finance and economics, compare, for example, Bäuerle and Rieder (2011), Hambly et al. (2021), or White (1993) for an overview. It can, among a multitude of other applications, be used to learn the optimal structure of portfolios and the optimal trading behavior, see, for example, Bertoluzzo and Corazza (2012), Chang and Lee (2017), Gold (2003), Hu and Lin (2019), Xiong et al.…”
Section: Introductionmentioning
confidence: 99%