2019
DOI: 10.1109/tii.2018.2868859
|View full text |Cite
|
Sign up to set email alerts
|

Feedback Deep Deterministic Policy Gradient With Fuzzy Reward for Robotic Multiple Peg-in-Hole Assembly Tasks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
45
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 126 publications
(50 citation statements)
references
References 14 publications
0
45
0
Order By: Relevance
“…The deep reinforcement learning (DRL) is developed to learn a mapping in an end-to-end method and DRL methods apply the computational power of deep learning model to relaxing the curse of dimensionality in complex tasks [15]. A Deep Deterministic Policy Gradient (DDPG) is a practical RL algorithm and it learns a Q-value function and a policy [16]. The Bellman equation and the off-policy sample data are used to learn the Q-value function and the Q-value is used to train an agent to get a policy.…”
Section: Deep Reinforcement Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…The deep reinforcement learning (DRL) is developed to learn a mapping in an end-to-end method and DRL methods apply the computational power of deep learning model to relaxing the curse of dimensionality in complex tasks [15]. A Deep Deterministic Policy Gradient (DDPG) is a practical RL algorithm and it learns a Q-value function and a policy [16]. The Bellman equation and the off-policy sample data are used to learn the Q-value function and the Q-value is used to train an agent to get a policy.…”
Section: Deep Reinforcement Learningmentioning
confidence: 99%
“…Similarly, critic networks are the Q-value function evaluation network, including the target network and current network respectively. Critic updates network parameters using the TD error, as shown in Equation (16).…”
Section: Multi-agent Ddpg Algorithm With Parameter Sharingmentioning
confidence: 99%
“…Xu et al [15] proposed learning dual peg insertion by using the deep deterministic policy gradient [16] (DDPG) algorithm with a fuzzy reward system. Similarly, Fan et al [17] used DDPG combined with guided policy search (GPS) [18] to learn high-precision assembly tasks.…”
Section: Introductionmentioning
confidence: 99%
“…[ 3 , 4 ]. In the contact tasks of industrial robots, the contact force needs to be included [ 5 , 6 , 7 , 8 , 9 ]. The actions taken by a teacher can be perceived by sensors, such as visual sensors to capture a teacher’s body movements [ 10 , 11 ] or recognize a teacher’s gestures [ 12 ], wearable sensors, and force sensors to perceive a teacher’s behavioral intentions [ 13 , 14 , 15 ].…”
Section: Introductionmentioning
confidence: 99%
“…The actions taken by a teacher can be perceived by sensors, such as visual sensors to capture a teacher’s body movements [ 10 , 11 ] or recognize a teacher’s gestures [ 12 ], wearable sensors, and force sensors to perceive a teacher’s behavioral intentions [ 13 , 14 , 15 ]. Compared with visual sensors, wearable sensors, etc., force sensor-based kinesthetic teaching is suitable for non-professionals to tell the robot the action needed to be taken in current state in a simple and intuitive way [ 5 , 6 , 7 , 13 , 14 , 15 , 16 ].…”
Section: Introductionmentioning
confidence: 99%