A Framework for DRL Navigation With State Transition Checking and Velocity Increment Scheduling

Dewa, Chandra Kusuma; Miura, Jun

doi:10.1109/access.2020.3033016

Cited by 6 publications

(6 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As also shown in Table 1, we use different reward functions inside the navigation and the attending environment since all environments have dissimilar objectives. We employ a reward function which is based on the artificial potential field (APF) inside the navigation environment so that the robot is capable of avoiding obstacles and able to reach the target person [15], [42], [43]. On the other hand, we apply U-Shaped reward function [22] inside the attending environment so that the robot is able to attend the target person at his left or at his right side as depicted in Fig.…”

Section: A Person-following Robot Environmentmentioning

confidence: 99%

“…Since we have already obtained the optimal navigation policy π nav from our prior study [15], we do not need to perform the training for the navigation task anymore. Instead, we reuse and integrate the policy to the left attending policy and the right attending policy which are obtained in this study.…”

Section: A Experiments Setupmentioning

confidence: 99%

“…6 (c)) for the person-following task validation. In addition, all environments in this study are implemented in Python which follow the standardized Ope-nAI Gym framework [47] and the state transition checking method [15].…”

Section: A Experiments Setupmentioning

confidence: 99%

“…The goal of our study is to train a mobile robot capable of performing the person-following task using DRL method. Since we have already obtained the optimal navigation policy from our previous work [15] and we want to reuse the policy in this study, we then extend the prior work of Frans et al [14] so that it is capable of reusing optimal policies which correspond to all sub tasks in a complex environment and have been trained beforehand. The main contributions of the paper are summarized as follows:…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Integrating Multiple Policies for Person-Following Robot Training Using Deep Reinforcement Learning

Dewa

Miura

2021

IEEE Access

Self Cite

View full text Add to dashboard Cite

Given a training environment which follows Markov decision process for a specific task, a deep reinforcement learning (DRL) agent is able to find possible optimal policies which map states of the environment to appropriate actions by repeatedly trying various actions to maximize training rewards. However, the learned policies cannot be reused directly in the training process for other new tasks resulting wasted precious time and resources. To solve this problem, we propose a DRL-based method for training an agent capable of selecting the appropriate policy for current state of the environment from a set of previously trained optimal policies for a given task which can be decomposed into other sub tasks. We implement our proposed method to a person-following robot task training that can be broken down into three sub tasks, namely: navigation, left attending, and right attending. Using the proposed method, the previously learned optimal navigation policy obtained from our previous work is integrated to attending policies which are trained in this study. We also introduce the use of weight-scheduled action smoothing which is able to stabilize actions generated by the agent in the attending task training. Our experiment results show that the proposed method is able to integrate all sub policies using the action smoothing method even though the navigation and the attending policies have dissimilar input structures, unalike output ranges, and are trained in different ways. Moreover, our proposed method shows better results compared to training from scratch and training using transfer learning strategy. INDEX TERMSDeep reinforcement learning, policies integration, person-following robot, robot training

show abstract

Section: A Person-following Robot Environmentmentioning

confidence: 99%

Section: A Experiments Setupmentioning

confidence: 99%

Section: A Experiments Setupmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Integrating Multiple Policies for Person-Following Robot Training Using Deep Reinforcement Learning

Dewa

Miura

2021

IEEE Access

Self Cite

View full text Add to dashboard Cite

show abstract

“…An environment that follows an MDP is needed to train DRL agents [43], which is formulated by conceptualizing the dynamic relationship between the RSUs and UAVs as a Stackelberg game. This strategic interaction is then formally cast as a multi-agent MDP.…”

Section: A Mdp For the Stackelberg Game Between Rsus And Uavsmentioning

confidence: 99%

Resource Allocation for Multi-Tenant Network Slicing: A Multi-Leader Multi-Follower Stackelberg Game Approach

Tran

2020

IEEE Trans. Veh. Technol.

View full text Add to dashboard Cite

The synergy between Unmanned Aerial Vehicles (UAVs) and metaverses is giving rise to an emerging paradigm named UAV metaverses, which create a unified ecosystem that blends physical and virtual spaces, transforming drone interaction and virtual exploration. UAV Twins (UTs), as the digital twins of UAVs that revolutionize UAV applications by making them more immersive, realistic, and informative, are deployed and updated on ground base stations, e.g., RoadSide Units (RSUs), to offer metaverse services for UAV Metaverse Users (UMUs). Due to the dynamic mobility of UAVs and limited communication coverages of RSUs, it is essential to perform real-time UT migration to ensure seamless immersive experiences for UMUs. However, selecting appropriate RSUs and optimizing the required bandwidth is challenging for achieving reliable and efficient UT migration. To address the challenges, we propose a tiny machine learning-based Stackelberg game framework based on pruning techniques for efficient UT migration in UAV metaverses. Specifically, we formulate a multi-leader multifollower Stackelberg model considering a new immersion metric of UMUs in the utilities of UAVs. Then, we design a Tiny Multi-Agent Deep Reinforcement Learning (Tiny MADRL) algorithm to obtain the tiny networks representing the optimal game solution. Specifically, the actor-critic network leverages the pruning techniques to reduce the number of network parameters and achieve model size and computation reduction, allowing for efficient implementation of Tiny MADRL. Numerical results demonstrate that our proposed schemes have better performance than traditional schemes.

show abstract

Research on Autonomous Obstacle Avoidance and Target Tracking of UAV Based on Improved Dueling DQN Algorithm

Jiang

Bao

et al. 2021

2021 China Automation Congress (CAC)

View full text Add to dashboard Cite

A Framework for DRL Navigation With State Transition Checking and Velocity Increment Scheduling

Cited by 6 publications

References 41 publications

Integrating Multiple Policies for Person-Following Robot Training Using Deep Reinforcement Learning

Integrating Multiple Policies for Person-Following Robot Training Using Deep Reinforcement Learning

Resource Allocation for Multi-Tenant Network Slicing: A Multi-Leader Multi-Follower Stackelberg Game Approach

Research on Autonomous Obstacle Avoidance and Target Tracking of UAV Based on Improved Dueling DQN Algorithm

Contact Info

Product

Resources

About