2021
DOI: 10.48550/arxiv.2104.10403
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Model-aided Deep Reinforcement Learning for Sample-efficient UAV Trajectory Design in IoT Networks

Omid Esrafilian,
Harald Bayerlein,
David Gesbert

Abstract: Deep Reinforcement Learning (DRL) is gaining attention as a potential approach to design trajectories for autonomous unmanned aerial vehicles (UAV) used as flying access points in the context of cellular or Internet of Things (IoT) connectivity. DRL solutions offer the advantage of on-the-go learning hence relying on very little prior contextual information. A corresponding drawback however lies in the need for many learning episodes which severely restricts the applicability of such approach in real-world tim… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 11 publications
0
2
0
Order By: Relevance
“…‚ Model-aided Deep Reinforcement Learning (Q Learning) [31] ‚ MARL with Deep Q-Learning [32] approaches, RL and deep RL have been used for UAV policy navigation because of their ability to learn directly by interactions with the surrounding environments [24], [39]- [43]. When the environment has a grid-world representation (e.g., indoors), Q-learning represents a simple and optimal solution because state-action pairs can be represented by a tractable Q-table that is updated at each time instant according to the received rewards [4], [14], [44].…”
Section: Applications Optimization Objective Techniquesmentioning
confidence: 99%
“…‚ Model-aided Deep Reinforcement Learning (Q Learning) [31] ‚ MARL with Deep Q-Learning [32] approaches, RL and deep RL have been used for UAV policy navigation because of their ability to learn directly by interactions with the surrounding environments [24], [39]- [43]. When the environment has a grid-world representation (e.g., indoors), Q-learning represents a simple and optimal solution because state-action pairs can be represented by a tractable Q-table that is updated at each time instant according to the received rewards [4], [14], [44].…”
Section: Applications Optimization Objective Techniquesmentioning
confidence: 99%
“…Until now, very few studies have been reported to deal exclusively with the UAV trajectory design problem by considering the partially observed networking environment and unexpected mobility/locations of the IoT devices. In [20], the authors introduced a model-based deep reinforcement learning (DRL) UAV path planning algorithm for data collection, where a device localization mechanism was used by dividing the ground nodes into either known or unknown locations. Nonetheless, they made the assumption that the UAVs are given predetermined targets and the IoT nodes are static with complete location information.…”
Section: A Literature Reviewmentioning
confidence: 99%