Recent advances in reinforcement learning in finance

Hambly, Ben; Xu, Renyuan; Yang, Huining

doi:10.1111/mafi.12382

Cited by 46 publications

(13 citation statements)

References 248 publications

(565 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SARSA is an example of a value-based RL method, as the aim of SARSA is to find the optimal Q-function by directly evaluating which state-action pairs are more promising in terms of expected cumulative future rewards [9]. Value methods such as SARSA do not explicitly consider the optimal policy [12]. Conversely, policy-based methods keep the current estimate of the optimal policy in memory during the learning process, and updates are performed on this optimal policy estimate rather than the value function [12].…”

Section: Reinforcement Learning 21 Fundamentalsmentioning

confidence: 99%

“…Value methods such as SARSA do not explicitly consider the optimal policy [12]. Conversely, policy-based methods keep the current estimate of the optimal policy in memory during the learning process, and updates are performed on this optimal policy estimate rather than the value function [12]. Using a value method rather than the policy method may impose the constraint of discrete action spaces.…”

Section: Reinforcement Learning 21 Fundamentalsmentioning

confidence: 99%

“…In addition to these reviews of the entire field of RL, there are works that consider the use of RL in finance. A recent paper [12] provides a comprehensive overview of RL usage for optimal execution and market making in electronic markets, portfolio optimization, automated trading, order routing and option hedging. Ref.…”

Section: Similar Workmentioning

confidence: 99%

See 2 more Smart Citations

Deep Reinforcement Learning for Dynamic Stock Option Hedging: A Review

Pickard,

Lawryshyn

2023

Mathematics

View full text Add to dashboard Cite

This paper reviews 17 studies addressing dynamic option hedging in frictional markets through Deep Reinforcement Learning (DRL). Specifically, this work analyzes the DRL models, state and action spaces, reward formulations, data generation processes and results for each study. It is found that policy methods such as DDPG are more commonly employed due to their suitability for continuous action spaces. Despite diverse state space definitions, a lack of consensus exists on variable inclusion, prompting a call for thorough sensitivity analyses. Mean-variance metrics prevail in reward formulations, with episodic return, VaR and CvaR also yielding comparable results. Geometric Brownian motion is the primary data generation process, supplemented by stochastic volatility models like SABR (stochastic alpha, beta, rho) and the Heston model. RL agents, particularly those monitoring transaction costs, consistently outperform the Black–Scholes Delta method in frictional environments. Although consistent results emerge under constant and stochastic volatility scenarios, variations arise when employing real data. The lack of a standardized testing dataset or universal benchmark in the RL hedging space makes it difficult to compare results across different studies. A recommended future direction for this work is an implementation of DRL for hedging American options and an investigation of how DRL performs compared to other numerical American option hedging methods.

show abstract

Section: Reinforcement Learning 21 Fundamentalsmentioning

confidence: 99%

Section: Reinforcement Learning 21 Fundamentalsmentioning

confidence: 99%

See 1 more Smart Citation

Deep Reinforcement Learning for Dynamic Stock Option Hedging: A Review

Pickard,

Lawryshyn

2023

Mathematics

View full text Add to dashboard Cite

show abstract

“…In practice, market makers interact with client order flow and learn to adjust their quotes in order to maximize their profit. We model this learning process using a decentralized multi-agent deep reinforcement learning algorithm (Hambly et al, 2023) using a policy gradient method (Fazel et al, 2018) to update market makers strategies, parameterized via neural networks. Our simulation results show that the interaction of market making algorithms through market prices, without any sharing of information, may give rise to tacit collusion, as evidenced by quoted spread levels significantly higher than in competitive (Nash) equilibrium.…”

Section: Introductionmentioning

confidence: 99%

Dynamics of market making algorithms in dealer markets: Learning and tacit collusion

Cont

Xiong

2023

Mathematical Finance

View full text Add to dashboard Cite

The widespread use of market‐making algorithms in electronic over‐the‐counter markets may give rise to unexpected effects resulting from the autonomous learning dynamics of these algorithms. In particular the possibility of “tacit collusion” among market makers has increasingly received regulatory scrutiny. We model the interaction of market makers in a dealer market as a stochastic differential game of intensity control with partial information and study the resulting dynamics of bid‐ask spreads. Competition among dealers is modeled as a Nash equilibrium, while collusion is described in terms of Pareto optima. Using a decentralized multi‐agent deep reinforcement learning algorithm to model how competing market makers learn to adjust their quotes, we show that the interaction of market making algorithms via market prices, without any sharing of information, may give rise to tacit collusion, with spread levels strictly above the competitive equilibrium level.

show abstract

“…The Markov decision problem leads to an infinite horizon stochastic optimal control problem in discrete‐time, which finds many applications in finance and economics, compare, for example, Bäuerle and Rieder (2011), Hambly et al. (2021), or White (1993) for an overview. It can, among a multitude of other applications, be used to learn the optimal structure of portfolios and the optimal trading behavior, see, for example, Bertoluzzo and Corazza (2012), Chang and Lee (2017), Gold (2003), Hu and Lin (2019), Xiong et al.…”

Section: Introductionmentioning

confidence: 99%

Markov decision processes under model uncertainty

Neufeld¹,

Sester

Sikic

2023

Mathematical Finance

View full text Add to dashboard Cite

We introduce a general framework for Markov decision problems under model uncertainty in a discretetime infinite horizon setting. By providing a dynamic programming principle, we obtain a local-to-global paradigm, namely solving a local, that is, a one timestep robust optimization problem leads to an optimizer of the global (i.e., infinite time-steps) robust stochastic optimal control problem, as well as to a corresponding worst-case measure. Moreover, we apply this framework to portfolio optimization involving data of the 𝑆&𝑃 500.We present two different types of ambiguity sets; one is fully data-driven given by a Wasserstein-ball around the empirical measure, the second one is described by a parametric set of multivariate normal distributions, where the corresponding uncertainty sets of the parameters are estimated from the data. It turns out that in scenarios where the market is volatile or bearish, the optimal portfolio strategies from the corresponding robust optimization problem outperforms the ones without model uncertainty, showcasing the importance of taking model uncertainty into account.

show abstract

Recent advances in reinforcement learning in finance

Cited by 46 publications

References 248 publications

Deep Reinforcement Learning for Dynamic Stock Option Hedging: A Review

Deep Reinforcement Learning for Dynamic Stock Option Hedging: A Review

Dynamics of market making algorithms in dealer markets: Learning and tacit collusion

Markov decision processes under model uncertainty

Contact Info

Product

Resources

About