2019
DOI: 10.1016/j.ast.2019.06.024
|View full text |Cite
|
Sign up to set email alerts
|

Fast task allocation for heterogeneous unmanned aerial vehicles through reinforcement learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
41
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 79 publications
(41 citation statements)
references
References 9 publications
0
41
0
Order By: Relevance
“…Value-iteration methods are often carried out off-policy, meaning that the policy used to generate behavior for training data can be unrelated to the policy being evaluated and improved, called the estimation policy [11,12]. Popular value-iteration methods used in dynamic task scheduling are Q-Learning [7,9,10,[15][16][17] and Deep Q-Network (DQN) [3,8,[18][19][20]. Apart from these two, Greedy methods [19], Monte Carlo Methods [21] and Temporal Difference (TD) Learning [22,23] also have been used.…”
Section: Value-iteration Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…Value-iteration methods are often carried out off-policy, meaning that the policy used to generate behavior for training data can be unrelated to the policy being evaluated and improved, called the estimation policy [11,12]. Popular value-iteration methods used in dynamic task scheduling are Q-Learning [7,9,10,[15][16][17] and Deep Q-Network (DQN) [3,8,[18][19][20]. Apart from these two, Greedy methods [19], Monte Carlo Methods [21] and Temporal Difference (TD) Learning [22,23] also have been used.…”
Section: Value-iteration Methodsmentioning
confidence: 99%
“…The Markov decision process is a mathematical model to describe the decision problem for an agent in an environment with the Markov property. The Markov property simply states that, the future actions are independent on the past, given the present [3,10,11,14]. In the framework of reinforcement, dynamic task/ resource allocation decisions and the choosing of long-term optimal actions based upon delayed rewards from the environment have been modeled as a Markov Decision Process.…”
Section: Deep Learning Reinforcement Learning and Deep Reinforcementmentioning
confidence: 99%
See 3 more Smart Citations