2020 IEEE International Conference on Progress in Informatics and Computing (PIC) 2020
DOI: 10.1109/pic50277.2020.9350833
|View full text |Cite
|
Sign up to set email alerts
|

Fast-PPO: Proximal Policy Optimization with Optimal Baseline Method

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 9 publications
0
5
0
Order By: Relevance
“…The profitable maneuvers in a complex and dynamic stock market make it more challenging to design accurate MTS classification systems and to predict multivariate trade data points [7]- [8]. Therefore, the conventional method is first applied to measure the expected stock return.…”
Section: A Motivation and Contributionsmentioning
confidence: 99%
See 2 more Smart Citations
“…The profitable maneuvers in a complex and dynamic stock market make it more challenging to design accurate MTS classification systems and to predict multivariate trade data points [7]- [8]. Therefore, the conventional method is first applied to measure the expected stock return.…”
Section: A Motivation and Contributionsmentioning
confidence: 99%
“…Therefore, a robust MTS classification system must be designed for accurate market prediction [10]- [12] [13]. The proposed framework identifies missing or faulty components of MTS data to improve the overall accuracy of the proposed framework [8]. Hence, the proposed framework better representations of faulty and non-faulty data components of multivariate time series data using partially ordered set (POSET)-based Hasse representations using mathematical modelling (shown in Fig 2).…”
Section: A Motivation and Contributionsmentioning
confidence: 99%
See 1 more Smart Citation
“…The proposed algorithm adopted the idea used in MAML meta-learning. During the base learner's learning process, the methods reported in [26][27][28][29][30] were adopted and the concepts used in the PPO deep reinforcement learning algorithm such as the experience replay, valuation neural network, and target neural network were also adopted.…”
Section: Base Learnermentioning
confidence: 99%
“…In the inner loop, the Meta-PPO algorithm uses a small amount of data from a randomly chosen task τ as the learning data to update the model parameters, reducing the model's loss on task τ. In this loop, the model parameter updating process is the same as the PPO algorithm proposed in [26][27][28][29][30]. The neural network of the algorithm learns from several batches of data on the randomly chosen tasks.…”
Section: Base Learnermentioning
confidence: 99%