2004
DOI: 10.1007/s00422-004-0485-3
|View full text |Cite
|
Sign up to set email alerts
|

Biological arm motion through reinforcement learning

Abstract: The present paper discusses an optimal learning control method using reinforcement learning for biological systems with a redundant actuator. It is difficult to apply reinforcement learning to biological control systems because of the redundancy in muscle activation space. We solve this problem with the following method. First, we divide the control input space into two subspaces according to a priority order of learning and restrict the search noise for reinforcement learning to the first priority subspace. T… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
29
0

Year Published

2006
2006
2019
2019

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 50 publications
(31 citation statements)
references
References 31 publications
0
29
0
Order By: Relevance
“…(14) and Eq. (15). Then, the state of the arm is updated with the Runge-Kutta method and the time changes from t to t + Δt.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…(14) and Eq. (15). Then, the state of the arm is updated with the Runge-Kutta method and the time changes from t to t + Δt.…”
Section: Discussionmentioning
confidence: 99%
“…Alternatively, the Actor-Critic method which is one of the major frameworks for temporal difference (TD) learning in the Reinforcement Learning [13] is adapted to train the feedback controller in feedback-error-learning. Since it is difficult to imagine that we are born with a trained feedback controller, learning should be performed in a trial-error manner to acquire it [14], [15]. Finally, a FDM is used in the feedback path to overcome sensory delays.…”
Section: Introductionmentioning
confidence: 99%
“…The learning problem is simplified by generating exploration noise along two subspaces, one in which the joint stiffness remains constant, and another in which it does not. Most relevant to this article is the fact that the agent learns to increase the impedance through co-contraction when the arm is perturbed with a force of randomly varying orientation [9] or strength [10]. However, applying this approach to higherdimensional systems or real robots might be challenging, as it requires tuning of the time-constant for learning, appropriately setting the bias for the initial stiffness, and determining the appropriate neural network structure.…”
Section: B Variable Impedance Control In Roboticsmentioning
confidence: 99%
“…An early model-free impedance learning approach was presented in [10], where a simulated 2DOF simulated robot arm with antagonistic muscle learns to reach for a target object whilst minimizing motor commands. The reinforcement learning algorithm is implemented as an actor-critic architecture, where the critic learns the value function by minimizing the temporal-difference error, and the actor determines the muscle forces.…”
Section: B Variable Impedance Control In Roboticsmentioning
confidence: 99%
“…We adopt the Actor-Critic method [14] in order to acquire a feedback controller for arm reaching. Although we are not the first to apply the Actor-Critic method to reaching tasks, previous models only explain the reaching movement toward one particular target [15]. In our daily life, we are not always reaching to the same target.…”
Section: Introductionmentioning
confidence: 99%