Making Use of Unelaborated Advice to Improve Reinforcement Learning: A Mobile Robotics Approach

Moreno, David; Regueiro, Carlos V.; Iglesias, Roberto; Barro, Senén

doi:10.1007/11551188_10

Cited by 16 publications

(17 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another solution to this problem is including prior knowledge into the reinforcement learning process. These researches include: the dynamic knowledge-orientated creation of the state space (Hailu, 2001) and the focalization of exploration (Lin, 1992;Millán, Posenato, & Dedieu, 2002;Moreno, Regueiro, & Iglesias, 2004).…”

Section: Related Researchesmentioning

confidence: 99%

See 1 more Smart Citation

Dynamic packaging in e-retailing with stochastic demand over finite horizons: A Q-learning approach

Cheng

2009

Expert Systems with Applications

View full text Add to dashboard Cite

Section: Related Researchesmentioning

confidence: 99%

“…The objective of this section is to determine the probability of taking a decision through integrating the above exploitation/exploration policy with the SRL model proposed by Moreno et al (2004). The main idea is as follows (for detail see Moreno et al, 2004).…”

Section: Decision Selection Blockmentioning

confidence: 99%

Dynamic packaging in e-retailing with stochastic demand over finite horizons: A Q-learning approach

Cheng

2009

Expert Systems with Applications

View full text Add to dashboard Cite

“…In order to speed up the convergence, Singer and Veloso [14] propose to solve the new problem via inducing the local features of original problem; Hailu and Sommer [15] discuss the effects of different bias information on learning speed by introducing environment information. Moreno et al [16] propose to to introduce prior in supervised reinforcement learning; Lin and Li [17] build a reinforcement learning model based on latent bias; and Fernández and Veloso [18] reuse past learnt bias to supervise the solving of similar tasks. The above approaches use bias to supervise the selection of strategies from actions, can utilize the bias from external environment or past tasks, and thus the learning speed is accelerated.…”

Section: Related Workmentioning

confidence: 99%

Hierarchical Reinforcement Learning Based Self-balancing Algorithm for Two-wheeled Robots

Yan¹,

Yang²

2016

TOEEJ

View full text Add to dashboard Cite

Self-balancing control is the basis for applications of two-wheeled robots. In order to improve the self-balancing of twowheeled robots, we propose a hierarchical reinforcement learning algorithm for controlling the balance of two-wheeled robots. After describing the subgoals of hierarchical reinforcement learning, we extract features for subgoals, define a feature value vector and its corresponding weight vector, and propose a reward function with additional subgoal reward function. Finally, we give a hierarchical reinforcement learning algorithm for finding the optimal strategy. Simulation experiments show that, the proposed algorithm is more effectiveness than traditional reinforcement learning algorithm in convergent speed. So in our system, the robots can get selfbalanced very quickly.

show abstract

“…Additionally, many current advice-taking systems [5], [10] require that the human encode her advice into a scripting or programming language, making it inaccessible to non-technical users.…”

Section: A Advice-taking Agentsmentioning

confidence: 99%

TAMER: Training an Agent Manually via Evaluative Reinforcement

Knox

Stone

2008

2008 7th IEEE International Conference on Development and Learning

View full text Add to dashboard Cite

Abstract-Though computers have surpassed humans at many tasks, especially computationally intensive ones, there are many tasks for which human expertise remains necessary and/or useful. For such tasks, it is desirable for a human to be able to transmit knowledge to a learning agent as quickly and effortlessly as possible, and, ideally, without any knowledge of the details of the agent's learning process. This paper proposes a general framework called Training an Agent Manually via Evaluative Reinforcement (TAMER) that allows a human to train a learning agent to perform a common class of complex tasks simply by giving scalar reward signals in response to the agent's observed actions. Specifically, in sequential decision making tasks, an agent models the human's reward function and chooses actions that it predicts will receive the most reward. Our novel algorithm is fully implemented and tested on the game Tetris. Leveraging the human trainers' feedback, the agent learns to clear an average of more than 50 lines by its third game, an order of magnitude faster than the best autonomous learning agents.

show abstract

Making Use of Unelaborated Advice to Improve Reinforcement Learning: A Mobile Robotics Approach

Cited by 16 publications

References 7 publications

Dynamic packaging in e-retailing with stochastic demand over finite horizons: A Q-learning approach

Dynamic packaging in e-retailing with stochastic demand over finite horizons: A Q-learning approach

Hierarchical Reinforcement Learning Based Self-balancing Algorithm for Two-wheeled Robots

TAMER: Training an Agent Manually via Evaluative Reinforcement

Contact Info

Product

Resources

About