Fast-PPO: Proximal Policy Optimization with Optimal Baseline Method

Xiao, Zhu; Xie, Ning; Yang, Guobiao; Du, Zhenjiang

doi:10.1109/pic50277.2020.9350833

Cited by 2 publications

(5 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The profitable maneuvers in a complex and dynamic stock market make it more challenging to design accurate MTS classification systems and to predict multivariate trade data points [7]- [8]. Therefore, the conventional method is first applied to measure the expected stock return.…”

Section: A Motivation and Contributionsmentioning

confidence: 99%

“…Therefore, a robust MTS classification system must be designed for accurate market prediction [10]- [12] [13]. The proposed framework identifies missing or faulty components of MTS data to improve the overall accuracy of the proposed framework [8]. Hence, the proposed framework better representations of faulty and non-faulty data components of multivariate time series data using partially ordered set (POSET)-based Hasse representations using mathematical modelling (shown in Fig 2).…”

Section: A Motivation and Contributionsmentioning

confidence: 99%

“…On the other hand, value-based algorithms learn to select actions based on the predicted value of the input state (s) or action (a). We utilized the Deep Deterministic Policy Gradient (DDPG) algorithm [8] to concurrently learn the Q-function, Q-values, and a policy (π). The working of the different Q-learning models (A2C, PPO, and DDPG) are shown in Algorithm 2.…”

Section: Model Selectionmentioning

confidence: 99%

See 2 more Smart Citations

Early MTS Forecasting for Dynamic Stock Prediction: A Double Q-Learning Ensemble Approach

Kumar,

Alsamhi,

Kumar

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Multivariate time series (MTS) forecasting is a rapidly expanding field with diverse and futuristic applications. However, traditional statistical learning models need more prediction accuracy when faced with dynamic variability, non-linearity, and non-stationarity, as well as the challenge of selecting MTS data for classification. Moreover, the existing methods for early classification of multivariate time series data suffer from numerous severe challenges, including evaluating the length of testing the MTS data component, which must be equal to the training MTS data component, and the availability of faulty data components in MTS. To address this issue, we propose a novel framework for early MTS forecasting using double Q-learning-based ensemble techniques to improve prediction accuracy. Our framework uses Q-learning agents to select optimal actions, which results in maximum rewards and accurate prediction. We investigate the ensemble behavior of learned agents using double Q-learning and Gaussian Process Classifiers (GPC) for early forecasting of MTS data. We also determine the minimum required time-series length for classifying faulty data components using the probabilistic Auto-Regressive Integrated Moving Average (ARIMA) model, enhancing framework robustness and mitigating miss-classification accuracy. Our proposed framework achieves 99.89% accuracy for early forecasting, surpassing existing methods based on different benchmark settings and publicly available multivariate time-series datasets. The framework provides a promising solution to the challenges of accurate MTS forecasting and offers insights into the early prediction of recent stock market trading data.

show abstract

Section: A Motivation and Contributionsmentioning

confidence: 99%

Section: A Motivation and Contributionsmentioning

confidence: 99%

Section: Model Selectionmentioning

confidence: 99%

See 1 more Smart Citation

Early MTS Forecasting for Dynamic Stock Prediction: A Double Q-Learning Ensemble Approach

Kumar,

Alsamhi,

Kumar

et al. 2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…The proposed algorithm adopted the idea used in MAML meta-learning. During the base learner's learning process, the methods reported in [26][27][28][29][30] were adopted and the concepts used in the PPO deep reinforcement learning algorithm such as the experience replay, valuation neural network, and target neural network were also adopted.…”

Section: Base Learnermentioning

confidence: 99%

“…In the inner loop, the Meta-PPO algorithm uses a small amount of data from a randomly chosen task τ as the learning data to update the model parameters, reducing the model's loss on task τ. In this loop, the model parameter updating process is the same as the PPO algorithm proposed in [26][27][28][29][30]. The neural network of the algorithm learns from several batches of data on the randomly chosen tasks.…”

Section: Base Learnermentioning

confidence: 99%

A Novel Intelligent Anti-Jamming Algorithm Based on Deep Reinforcement Learning Assisted by Meta-Learning for Wireless Communication Systems

Chen,

Niu,

Wan

et al. 2023

Applied Sciences

View full text Add to dashboard Cite

In the field of intelligent anti-jamming, deep reinforcement learning algorithms are regarded as key technical means. However, the learning process of deep reinforcement learning algorithms requires a stable learning environment to ensure its effectiveness. Moreover, the inherent limitations of deep reinforcement learning algorithms mean that they can only demonstrate excellent learning capabilities on specific tasks with constant parameters. When parameters change, they can only resample and relearn to converge. In a changing jamming environment, its stability and convergence speed may be challenged, thereby affecting its robustness and generalization capabilities. Aiming at the naive yet unique similarity characteristics of the communication anti-jamming problem, this paper designs a new Meta-PPO deep reinforcement learning algorithm that combines Proximal Policy Optimization (PPO) and MAML meta-learning ideas. The proposed algorithm engrafts the principle of meta-learning used in the Model Agnostic Meta-Learning (MAML) model onto the Proximal Policy Optimization (PPO)-based schemes, enabling the communication systems to harness its prior learned experiences acquired from previous anti-jamming tasks to facilitate and speed up its optimal decision-making process when faced with incoming jamming attacks with similar features. The proposed algorithm is verified through computer simulation analyses and the results show that the proposed novel Meta-PPO algorithm can outperform traditional DQN- and PPO-based algorithms in terms of better robustness and generalization abilities, which can be used to enhance the anti-jamming capabilities of wireless communication systems.

show abstract

Fast-PPO: Proximal Policy Optimization with Optimal Baseline Method

Cited by 2 publications

References 9 publications

Early MTS Forecasting for Dynamic Stock Prediction: A Double Q-Learning Ensemble Approach

Early MTS Forecasting for Dynamic Stock Prediction: A Double Q-Learning Ensemble Approach

A Novel Intelligent Anti-Jamming Algorithm Based on Deep Reinforcement Learning Assisted by Meta-Learning for Wireless Communication Systems

Contact Info

Product

Resources

About