2017 First IEEE International Conference on Robotic Computing (IRC) 2017
DOI: 10.1109/irc.2017.33
|View full text |Cite
|
Sign up to set email alerts
|

Active Exploration and Parameterized Reinforcement Learning Applied to a Simulated Human-Robot Interaction Task

Abstract: Online model-free reinforcement learning (RL) methods with continuous actions are playing a prominent role when dealing with real-world applications such as Robotics. However, when confronted to non-stationary environments, these methods crucially rely on an exploration-exploitation tradeoff which is rarely dynamically and automatically adjusted to changes in the environment. Here we propose an active exploration algorithm for RL in structured (parameterized) continuous action space. This framework deals with … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 25 publications
(18 citation statements)
references
References 20 publications
0
17
0
Order By: Relevance
“…Masson et al [34] handled discrete action with Qlearning and policy search for continuous action. Similarly, Khamassi et al [35] use Q-learning and policy gradient to achieve the same results. Those methods assume on-policy and handle discrete and continuous actions separately.…”
Section: Related Workmentioning
confidence: 96%
“…Masson et al [34] handled discrete action with Qlearning and policy search for continuous action. Similarly, Khamassi et al [35] use Q-learning and policy gradient to achieve the same results. Those methods assume on-policy and handle discrete and continuous actions separately.…”
Section: Related Workmentioning
confidence: 96%
“…However, the training time for the epsilon-greedy strategy is proportional to the scale of state space and action space [12] [13]. Another common method for explorationexploitation is the Boltzmann exploration strategy [14] [15] [16]. The Boltzmann exploration strategy guides a robot to select an action with a probability depending on the value function and a temperature function restrains the confusion of action selection.…”
Section: The Exploration-exploitation Dilemma In Obstacle Avoidancmentioning
confidence: 99%
“…In previous work, we have applied this meta-learning principle in an algorithm here referred to as MLB to dynamically tune β t in a simple multi-armed bandit scenario involving the interaction between a simulated human and a robot [10]. Formally, function F is a Boltzmann softmax with parameter φ = 0 (i.e.…”
Section: Problem Formulation and Algorithmsmentioning
confidence: 99%
“…The mid-and long-term rewards are also calculated like in [8] and used to update the inverse temperature parameter β t according to the update rule of MLB [10]. When the uncertainty of an arm's action value increases, the respective arm should be explored more.…”
Section: Hybrid Meta-learning With Kalman Filters -Mlb-kfmentioning
confidence: 99%