Deep Reinforcement Learning 2020
DOI: 10.1007/978-981-15-4095-0_7
|View full text |Cite
|
Sign up to set email alerts
|

Challenges of Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
41
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 65 publications
(76 citation statements)
references
References 14 publications
0
41
0
1
Order By: Relevance
“…RL methods suffer from scalability issues because of a large number of states [29]. We evaluate the scalability of our algorithms depending on the different number of layers in the network topology shown in Figure 3.…”
Section: G Scalabilitymentioning
confidence: 99%
“…RL methods suffer from scalability issues because of a large number of states [29]. We evaluate the scalability of our algorithms depending on the different number of layers in the network topology shown in Figure 3.…”
Section: G Scalabilitymentioning
confidence: 99%
“…In recent years, various off-policy RL algorithms have been successfully applied and have shown significant performance improvements in the challenging tasks from classic Atari games and Go, [1]- [5] to robotics control environments such as MuJoCo [6]- [13] and realworld implementation of robotic control [9]. However, there still exist two conventional challenges in the off-policy RL; exploration of large state space and efficient utilization of the stored experiences [14], [15]. The exploration focuses on how to make an RL agent encounter new and diverse experiences, and the experience utilization addresses on how to make an RL agent obtain the full knowledge by learning from the stored experiences.…”
Section: Introductionmentioning
confidence: 99%
“…This sequential process is called experience replay (ER) [16], which has been widely used for the off-policy RL algorithms with buffer size large enough to store various experience samples over a wide enough interval. However, the application of ER for off-policy RL still suffers from sampling inefficiency [14], [17]; while an agent needs to sample useful experience tuples in the RB at all times for optimal policy development, the sampling could be inefficient when the agent uses the conventional sampling techniques, for example, uniform sampling, especially in the early stage of learning when RB is being filled. In fact, applying uniform sampling results in a relatively high sampling frequency of experience tuples stored early in the RB, because the earlier experience tuples will be included in the sampling window a lot more times than the later experience tuples [18], [19].…”
Section: Introductionmentioning
confidence: 99%
“…Hence, multiple policies which are correlated to each task in the personfollowing robots development are required to be obtained during the training process. Along with the potentials in training and acquiring optimal policies for robots using deep reinforcement learning (DRL) [7], [8], which is the combination of deep learning (DL) and reinforcement learning (RL), applying the approach to develop a person-following robot can be one of great options to be considered. By using DRL, the training process for a specific policy which is represented as a DL model can be done directly without having to previously collect enormous labeled datasets.…”
Section: Introductionmentioning
confidence: 99%