Fast task allocation for heterogeneous unmanned aerial vehicles through reinforcement learning

Zhao, Xinyi; Zong, Qun; Tian, Bailing; Zhang, Boyuan; You, Min

doi:10.1016/j.ast.2019.06.024

Cited by 79 publications

(41 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Value-iteration methods are often carried out off-policy, meaning that the policy used to generate behavior for training data can be unrelated to the policy being evaluated and improved, called the estimation policy [11,12]. Popular value-iteration methods used in dynamic task scheduling are Q-Learning [7,9,10,[15][16][17] and Deep Q-Network (DQN) [3,8,[18][19][20]. Apart from these two, Greedy methods [19], Monte Carlo Methods [21] and Temporal Difference (TD) Learning [22,23] also have been used.…”

Section: Value-iteration Methodsmentioning

confidence: 99%

“…The Markov decision process is a mathematical model to describe the decision problem for an agent in an environment with the Markov property. The Markov property simply states that, the future actions are independent on the past, given the present [3,10,11,14]. In the framework of reinforcement, dynamic task/ resource allocation decisions and the choosing of long-term optimal actions based upon delayed rewards from the environment have been modeled as a Markov Decision Process.…”

Section: Deep Learning Reinforcement Learning and Deep Reinforcementmentioning

confidence: 99%

“…Zhao et al [10] have presented a fast task allocation (FTA) algorithm developed using Q-learning. The task allocation scheme is involved with approximation of neural networks and prioritization of experience replay.…”

Section: Q-learningmentioning

confidence: 99%

“…Unacceptable computation time for real-time implement and complexity of the algorithms is also a matter. Due to Table 1 High-level review of reinforcement learning techniques used in dynamic task scheduling Technique Merits Demerits Q-learning [7,9,10,[15][16][17] A very powerful algorithm Challenged by the uncertainty (non-stationarity) of the environment [24] Have produced considerable good results…”

Section: Research Findingsmentioning

confidence: 99%

“…Reinforcement learning is proficient in handling entirely diverse tasks in a changing environment by learning scheduling rules. Experience replay, ability to model various priority tasks at once is one of the significant advantages [10]. Reinforcement Learning has encouraging feasibility to solve the dynamic task scheduling problem.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Reinforcement Learning in Dynamic Task Scheduling: A Review

2020

View full text Add to dashboard Cite

Scheduling is assigning shared resources over time to efficiently complete the tasks over a given period of time. The term is applied separately for tasks and resources correspondingly in task scheduling and resource allocation. Scheduling is a popular topic in operational management and computer science. Effective schedules ensure system efficiency, effective decision making, minimize resource wastage and cost, and enhance overall productivity. It is generally a tedious task to choose the most accurate resources in performing work items and schedules in both computing and business process execution. Especially in real-world dynamic systems where multiple agents involve in scheduling various dynamic tasks is a challenging issue. Reinforcement Learning is an emergent technology which has been able to solve the problem of the optimal task and resource scheduling dynamically. This review paper is about a research study that focused on Reinforcement Learning techniques that have been used for dynamic task scheduling. The paper addresses the results of the study by means of the state-of-theart on Reinforcement learning techniques used in dynamic task scheduling and a comparative review of those techniques.

show abstract

Section: Value-iteration Methodsmentioning

confidence: 99%

Section: Deep Learning Reinforcement Learning and Deep Reinforcementmentioning

confidence: 99%

Section: Q-learningmentioning

confidence: 99%

Section: Research Findingsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Reinforcement Learning in Dynamic Task Scheduling: A Review

2020

View full text Add to dashboard Cite

show abstract

Extended‐state‐observer‐based dynamic surface control of flexible‐joint robot systems with input saturation

Lian

Wang

2021

Adaptive Control & Signal

View full text Add to dashboard Cite

This article presents an extended-state-observer-based dynamic surface control approach for flexible-joint robot systems with asymmetric input saturation and large unknown dynamic knowledge. Traditional controllers for flexible-joint robot systems usually use approximation technology to deal with unknown dynamics knowledge. Unlike the traditional control algorithm, this article utilizes an extended state observer to estimate the unknown dynamics.For the closed-loop system, the delay strategy handles the time-scale separation issue, the filtering system overcomes the "explosion of differentiation" caused by the repeated differentiation of auxiliary control signals, and the mean-value-theorem solves the input saturation problem of the actuator. The stability analysis implies that estimation errors of extended state observers (ESOs) and other state variables are semiglobally uniformly ultimately bounded.Compared with fuzzy control algorithms, the novel ESO-based dynamic surface control approach not only omits online learning time but also uses only a few control parameters to obtain satisfactory tracking performance. Finally, a comparison simulation experiment is provided to illustrate the effectiveness of the gained conclusions.

show abstract

Reinforcement learning control with function approximation via multivariate simplex splines

Feng

Zhou

et al. 2023

Adaptive Control & Signal

View full text Add to dashboard Cite

SummaryIn the field of optimal control for continuous nonlinear systems, function approximation methods are often employed to overcome the curse of dimensionality. Compared to other global function approximators like neural networks, multivariate splines can be easily evaluated and adapted on a local basis with linearity in the parameters. In this work, a multivariate spline based reinforcement learning (RL) strategy is proposed for solving the continuous‐time nonlinear control problem. Based on the classic value iteration method, multivariate splines are integrated into RL algorithms to approximate continuous value functions and policy functions from discrete action and value samples. Hence, the determined splines with updated coefficients can be utilized in continuous control of nonlinear systems. In the simulation experiment, the performance of the spline‐based RL control is evaluated in controlling an under‐actuated inverted pendulum. The proposed method is compared with the value iteration based discrete control strategy and the neural network based continuous control strategy. The simulation results indicate that the proposed method based on multivariate splines has better control performance with less state oscillations, energy consumption and convergence time in comparison with discrete value iteration and neural network based RL, and the adoption of simplex splines improves the function approximation efficiency with less computation time than neural network optimization.

show abstract

Fast task allocation for heterogeneous unmanned aerial vehicles through reinforcement learning

Cited by 79 publications

References 9 publications

Reinforcement Learning in Dynamic Task Scheduling: A Review

Reinforcement Learning in Dynamic Task Scheduling: A Review

Extended‐state‐observer‐based dynamic surface control of flexible‐joint robot systems with input saturation

Reinforcement learning control with function approximation via multivariate simplex splines

Contact Info

Product

Resources

About