Universal Trading for Order Execution with Oracle Policy Distillation

Yu, Fang; Ren, Kan; Liu, Weiqing; Zhou, Dong; Zhang, Weinan; Bian, Jiang; Yu, Yong; Liu, Tie-Yan

doi:10.1609/aaai.v35i1.16083

Cited by 13 publications

(13 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In quantitative finance, the primary goal of the investor is to maximize the long-term value through continuously trading of multiple assets in the market [1,2]. The process consists of two parts, portfolio management, which dynamically allocate the portfolio across the assets, and order execution whose goal is to fulfill a number of acquisition or liquidation orders specified by the portfolio management strategy, within a time horizon, and close the loop of investment [3,4]. Figure (1a) presents the trading process within one trading day.…”

Section: Introductionmentioning

confidence: 99%

“…Although there exists many works for order execution, few of them manage to address the above three challenges. Traditional financial model based methods [5][6][7] and some recently developed model-free reinforcement learning (RL) methods [4,8,9] only optimize the strategy for single-order execution without considering practice of multi-order execution, which would result in low trading efficacy. Moreover, it is not applicable to directly transfer the existing methods to multi-order execution since utilizing only one agent to conduct the execution of multiple orders would lead to scalability issue as the action space of one individual agent grows exponentially with the number of orders.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance

Tang

Ren

et al. 2023

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Self Cite

View full text Add to dashboard Cite

Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets. Recent advance in model-free reinforcement learning (RL) provides a data-driven solution to the order execution problem. However, the existing works always optimize execution for an individual order, overlooking the practice that multiple orders are specified to execute simultaneously, resulting in suboptimality and bias. In this paper, we first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. Specifically, we treat every agent as an individual operator to trade one specific order, while keeping communicating with each other and collaborating for maximizing the overall profits. Nevertheless, the existing MARL algorithms often incorporate communication among agents by exchanging only the information of their partial observations, which is inefficient in complicated financial market. To improve collaboration, we then propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other and refining accordingly. It is optimized through a novel action value attribution method which is provably consistent with the original learning objective yet more efficient. The experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness achieved by our method.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance

Tang

Ren

et al. 2023

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Self Cite

View full text Add to dashboard Cite

show abstract

“…Reinforcement learning (RL) has achieved remarkable progress in games [31,47,50], financial trading [8] and robotics [13]. However, in its core part, without designs tailored to specific tasks, general RL paradigms are still learning implicit representations from critic loss (value predictions) and actor loss (maximizing cumulative reward).…”

Section: Introductionmentioning

confidence: 99%

Reinforcement Learning with Automated Auxiliary Loss Search

He¹,

Zhang²,

Ren³

et al. 2022

Preprint

View full text Add to dashboard Cite

A good state representation is crucial to solving complicated reinforcement learning (RL) challenges. Many recent works focus on designing auxiliary losses for learning informative representations. Unfortunately, these handcrafted objectives rely heavily on expert knowledge and may be sub-optimal. In this paper, we propose a principled and universal method for learning better representations with auxiliary loss functions, named Automated Auxiliary Loss Search (A2LS), which automatically searches for top-performing auxiliary loss functions for RL. Specifically, based on the collected trajectory data, we define a general auxiliary loss space of size 7.5 × 10 20 and explore the space with an efficient evolutionary search strategy. Empirical results show that the discovered auxiliary loss (namely, A2-winner) significantly improves the performance on both high-dimensional (image) and lowdimensional (vector) unseen tasks with much higher efficiency, showing promising generalization ability to different settings and even different benchmark domains. We conduct a statistical analysis to reveal the relations between patterns of auxiliary losses and RL performance. The codes and supplementary materials are available at https://seqml.github.io/a2ls.

show abstract

“…Market traders buy and sell volatile assets frequently, with a goal to maximize their total return. Nowadays, it is not difficult for us to find trading strategies suitable for our preferences in many academic articles or forums [1]. However, it is still a problem of how to distinguish the good and bad of these strategies and avoid making some common mistakes, such as survivorship bias, look-ahead bias, and trading cost [2].…”

Section: Introductionmentioning

confidence: 99%

“…Let RP be the score given by RP, Seq be the score of Consecutive Days and His of Historical Data. The mapping relationship is: Conf = (His + Seq + RP − 1000)/50000 + 0.94(1)…”

mentioning

confidence: 99%

LSTMcon: A Novel System of Portfolio Management Based on Feedback LSTM with Confidence

Xie¹,

Gai²,

Guo³

et al. 2022

Proceedings of the 34th International Conference on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

Trading carries a substantial amount of risk and making adequately informed decisions cannot be overemphasized. In order to propose a more reasonable strategy on portfolio arrangement, we design LSTMcon, a two-stage system that consists of a assets price prediction model and a decisionmaking strategy based on ensemble rules. As for next-day price prediction, we implement an LSTM model with feedback mechanism and devise a series of training settings. The feedback mechanism uses the deviation between predicted price and actual price to correct the prediction result from LSTM. To decrease the transaction cost, we design a three-day trading period and adopt an iterative prediction approach. Our model achieves the accuracy of 98.5% on GOLD and 98.8% on BTC finally. In addition, we devise a decision-making system after getting the predicted data. We modify the predicted price by giving everyone a certain confidence level based on three approaches (reward and punishment mechanism, sequential days rules, historical price relying). We combine these rules and give a comprehensive confidence level to weigh the predicted price. Subsequently, we summarize the transactions into 8 trading operations, input the modified price and automatically compare the hypothetical return of these eight operations. Then, output the operation with largest return as today's decision. We compare the returns and transaction costs of comparative systems, and demonstrate our strategy with effectiveness.

show abstract

Universal Trading for Order Execution with Oracle Policy Distillation

Cited by 13 publications

References 34 publications

Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance

Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance

Reinforcement Learning with Automated Auxiliary Loss Search

LSTMcon: A Novel System of Portfolio Management Based on Feedback LSTM with Confidence

Contact Info

Product

Resources

About