2020
DOI: 10.1109/access.2020.3033016
|View full text |Cite
|
Sign up to set email alerts
|

A Framework for DRL Navigation With State Transition Checking and Velocity Increment Scheduling

Abstract: To train a mobile robot to navigate using end-to-end approach which maps sensors data into actions, we can use deep reinforcement learning (DRL) method by providing training environments with proper reward functions. Although some studies have shown the success of DRL in navigation task for mobile robots, the method needs appropriate hyperparameter settings such as the environment's timestep size and the robot's velocity range to produce a good navigation policy. The previous existing DRL framework has propose… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 41 publications
0
6
0
Order By: Relevance
“…As also shown in Table 1, we use different reward functions inside the navigation and the attending environment since all environments have dissimilar objectives. We employ a reward function which is based on the artificial potential field (APF) inside the navigation environment so that the robot is capable of avoiding obstacles and able to reach the target person [15], [42], [43]. On the other hand, we apply U-Shaped reward function [22] inside the attending environment so that the robot is able to attend the target person at his left or at his right side as depicted in Fig.…”
Section: A Person-following Robot Environmentmentioning
confidence: 99%
See 3 more Smart Citations
“…As also shown in Table 1, we use different reward functions inside the navigation and the attending environment since all environments have dissimilar objectives. We employ a reward function which is based on the artificial potential field (APF) inside the navigation environment so that the robot is capable of avoiding obstacles and able to reach the target person [15], [42], [43]. On the other hand, we apply U-Shaped reward function [22] inside the attending environment so that the robot is able to attend the target person at his left or at his right side as depicted in Fig.…”
Section: A Person-following Robot Environmentmentioning
confidence: 99%
“…Since we have already obtained the optimal navigation policy π nav from our prior study [15], we do not need to perform the training for the navigation task anymore. Instead, we reuse and integrate the policy to the left attending policy and the right attending policy which are obtained in this study.…”
Section: A Experiments Setupmentioning
confidence: 99%
See 2 more Smart Citations
“…An environment that follows an MDP is needed to train DRL agents [43], which is formulated by conceptualizing the dynamic relationship between the RSUs and UAVs as a Stackelberg game. This strategic interaction is then formally cast as a multi-agent MDP.…”
Section: A Mdp For the Stackelberg Game Between Rsus And Uavsmentioning
confidence: 99%