Macro Action Selection with Deep Reinforcement Learning in StarCraft

Xu, Sijia; Kuang, Hongyu; Zhuang, Zhiqiang; Hu, Renjie; Liu, Yang; Sun, Huyang

doi:10.1609/aiide.v15i1.5230

Cited by 18 publications

(1 citation statement)

References 12 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Ape-x algorithm separates data collection from strategy learning, uses multiple parallel agents to collect experience data, shares a large experience data buffer, and sends it to learners for learning. The original Ape-X, which was based on DQN and Deep Deterministic Policy Gradient(DDPG), was utilized in the feedback flow separation control system [53], StarCraft games [54] and controlling vehicles for autonomous driving [55]. The following two characteristics are where this paper most clearly improves: we first connect the Ape-x with a distributed reinforcement learning framework for off-policy learning.…”

Section: Introductionmentioning

confidence: 99%

A Mean-VaR Based Deep Reinforcement Learning Framework for Practical Algorithmic Trading

Jin

2023

IEEE Access

View full text Add to dashboard Cite

It is difficult to automatically produce trading signals based on previous transaction data and the financial status of assets because of the significant noise and unpredictability of capital markets. This paper proposes an innovative algorithm to solve the optimal portfolio problem in stock market trading activities. Our novel portfolio trading strategy utilizes three features to outperform other benchmark strategies in a real-market environment. First, we propose a mean-VaR portfolio optimization model, the solution of which is based on the actor-critic architecture. Unlike the existing literature that learns the expectation of cumulative returns, the critic module learns the cumulative returns distribution by quantile regression, and the actor module outputs the optimal portfolio weight by maximizing the objective function of the optimization model. Secondly, we use a linear transformation function to realize short selling to ensure investors have profit opportunities in the bear market. Third, A multi-process method, called Ape-x, was used to accelerate the speed of deep reinforcement learning training. To validate our proposed approach, we conduct backtesting for two representative portfolios and observe that the proposed model in this work is superior to the benchmark strategies.

show abstract

Section: Introductionmentioning

confidence: 99%