2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) 2019
DOI: 10.1109/devlrn.2019.8850699
|View full text |Cite
|
Sign up to set email alerts
|

Reward-Punishment Actor-Critic Algorithm Applying to Robotic Non-grasping Manipulation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 16 publications
0
2
0
Order By: Relevance
“…DMP [5], [6] argued for the necessity of such a learning rule in back-propagating the true amount of negative signals, while explaining how pain-avoidance induces sample efficiency and exploration. RP-AC [16] is a revamped reward-punishment Actor-Critic framework that formulates a policy gradient for continuous control. Split-Q Learning [17] suggested a more generalized reward-punishment framework by parameterizing immediate positive and negative rewards and their approximated state-action-values, aligning with various neurological and psychiatric mechanisms.…”
Section: A Separating Reward and Punishmentmentioning
confidence: 99%
“…DMP [5], [6] argued for the necessity of such a learning rule in back-propagating the true amount of negative signals, while explaining how pain-avoidance induces sample efficiency and exploration. RP-AC [16] is a revamped reward-punishment Actor-Critic framework that formulates a policy gradient for continuous control. Split-Q Learning [17] suggested a more generalized reward-punishment framework by parameterizing immediate positive and negative rewards and their approximated state-action-values, aligning with various neurological and psychiatric mechanisms.…”
Section: A Separating Reward and Punishmentmentioning
confidence: 99%
“…To address this issue, Bodnar et al developed Quantile QT-Opt (Q2-Opt), a distributional variant of distributed Q-learning algorithm, for continuous domains and evaluated its performance in both simulated and real vision-based robotic grasping tasks[132]. Additionally, Kobayashi et al proposed a Reward-Punishment Actor-Critic (RP-AC) algorithm to optimize robot trajectory by acquiring suitable rewards[133], while Demura et al used the You Only Look Once (YOLO) object detection approach to identify the optimal grasp point for stable manipulation in their Q-Learning grasping motion acquisition method[134]. This technique enabled the robot to pick up the uppermost folded towel from a stack and place it on a table.Kim et al demonstrated that deep learning-based techniques with direct visual input can achieve state-of-the-art results for robotic grasping in a cluttered environment with diverse unseen target objects [73].…”
mentioning
confidence: 99%