Flexible Heuristic Dynamic Programming for Reinforcement Learning in Quad-Rotors

Helmer, Alexander; Visser, Coen C. de; Kampen, E. van

doi:10.2514/6.2018-2134

Cited by 3 publications

(7 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, it implies that the shaping aspect of a learning curriculum is not limited to the training distribution only, but also resides in the agent representation. This interpretation of curriculum learning can be recognized in the work by Helmer et al [21] on flexible heuristic dynamic programming.…”

Section: B Curriculum Learningmentioning

confidence: 88%

“…Although safe curriculum learning for flight control is still a largely unexplored research area, other (related) paradigms have been investigated that aim to address the said safety and efficiency issues that undermine online reinforcement learning applications. Recently, Helmer et al [21] proposed the framework of flexible heuristic dynamic programming (HDP), and showed how decomposing a target task into a pair of successive tasks of lower complexity can expedite learning in the context of flight control. The concept of flexible function approximation can be regarded as a form of transfer learning (TL), which is a branch of RL research that specifically deals with the question of how the large data dependency of the reinforcement learning paradigm can be reduced by transferring knowledge obtained from training on one task to another task in which learning has not yet taken place [22].…”

Section: Related Workmentioning

confidence: 99%

“…Transfer learning takes a key position in curriculum learning for RL [15]. Consequently, the study by Helmer et al [21] can be regarded as work in curriculum learning, although the authors did not explicitly recognize it as such. Safety during learning was not considered.…”

Section: Related Workmentioning

confidence: 99%

“…In the case of naturally stable systems, this means that the zero mapping is sufficient to satisfy this condition. A secondary objective is that learning efficiency should be enhanced, which opens many possibilities for designing A map [21,31]. An intuitive approach is to directly copy weights to semantically similar network links, which is designated as greedy mapping.…”

Section: Task Network Mappingsmentioning

confidence: 99%

“…The system used for this experiment is modeled after a Parrot AR 2.0 UAV and has been of use in earlier RL research as well [21,49]. Rotational and translational control of the UAV is achieved by changing the rotational velocity of the individual motors.…”

Section: A Quadrotor Simulation Modelmentioning

confidence: 99%

See 4 more Smart Citations

Safe Curriculum Learning for Optimal Flight Control of Unmanned Aerial Vehicles with Uncertain System Dynamics

Pollack

Kampen

2020

AIAA Scitech 2020 Forum

Self Cite

View full text Add to dashboard Cite

Section: B Curriculum Learningmentioning

confidence: 88%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Task Network Mappingsmentioning

confidence: 99%

Section: A Quadrotor Simulation Modelmentioning

confidence: 99%

See 3 more Smart Citations

Safe Curriculum Learning for Optimal Flight Control of Unmanned Aerial Vehicles with Uncertain System Dynamics

Pollack

Kampen

2020

AIAA Scitech 2020 Forum

Self Cite

View full text Add to dashboard Cite

Safe Curriculum Learning for Linear Systems with Parametric Unknowns in Primary Flight Control

Buysscher

Pollack

Kampen

2022

AIAA SCITECH 2022 Forum

Self Cite

View full text Add to dashboard Cite

AIAA SciTech Forum controllers that are more flexible in their design process. Additionally, complex systems are often difficult to model and validate. Consequently, controllers that can be derived using data-driven methods are preferred in such cases. Advanced sensor-based controllers exist, such as incremental dynamic inversion controllers [1], to alleviate the burden of modelling complex systems. However, the aforementioned methods suffer from state reconstruction dependencies and synchronisation issues with the sensor data. Another approach to data driven controller design resides in machine learning paradigms such as Reinforcement Learning (RL). The fundamental principle used in RL is the representation of the world as an agent being confronted with a choice of action. The agent learns a control policy by interacting with the environment and gaining experience of the dynamics over time. Its basic principle is simple yet effective. However, with increasing task complexity (i.e., growing state and action spaces) increases, RL agents have a tendency to struggle learning a policy reliably [2]. Curriculum Learning (CurL) indroduced in [2], provides a structured approach to allow learning on more complex applications by dividing the initial task into sub-tasks [3,4]. This facilitates the agent's learning process and increases the likelihood of successfully finding a control policy [5]. Given the examples cited previously, certainly in transport applications where stringent (safety) requirements apply, the safety aspect of the learning process and the correct operation of the controller is of crucial importance. Unlike RL methods which in their simplest forms generally lack consideration of the safety aspect [6], Safe Learning (SL) does provide a framework to this end [7].The research outlined in this paper proposes a safe curriculum learning architecture that builds on the research presented in [8]. Here, the dependency on knowledge about an uncertain model for the safety algorithm is removed by complementing the paradigm in [8] with a system identification capability.First a brief introduction to the fields of RL, Curriculum Learning, Safe Learning, and system identification is provided in sections II.A, II.B, II.C and II.D, respectively. This is followed by a detailed presentation of the approach chosen in this research outlined in Section III. Finally, the proposed paradigm is tested through two experiments. Initially, a Mass-Spring-Damper (MSD) system is used to verify the architecture for which the results are presented in Section IV.A. In Section IV.B, the results of a the safe curriculum architecture applied on a quadrotor are outlined. The paper is concluded with a discussion of the results of the experiments, as well as a conclusion and recommendations for further research. II. Safe Curriculum Learning FrameworkThe core principles in safe curriculum learning are derived from three research fields: reinforcement learning, curriculum learning and safe learning. Inherently, the fundamentals originate from th...

show abstract

Safe-by-Design in Engineering: An Overview and Comparative Analysis of Engineering Disciplines

Gelder

Klaassen

Taebi

et al. 2021

IJERPH

View full text Add to dashboard Cite

In this paper, we provide an overview of how Safe-by-Design is conceived and applied in practice in a large number of engineering disciplines. We discuss the differences, commonalities, and possibilities for mutual learning found in those practices and identify several ways of putting those disciplinary outlooks in perspective. The considered engineering disciplines in the order of historically grown technologies are construction engineering, chemical engineering, aerospace engineering, urban engineering, software engineering, bio-engineering, nano-engineering, and finally cyber space engineering. Each discipline is briefly introduced, the technology at issue is described, the relevant or dominant hazards are examined, the social challenge(s) are observed, and the relevant developments in the field are described. Within each discipline the risk management strategies, the design principles promoting safety or safety awareness, and associated methods or tools are discussed. Possible dilemmas that the designers in the discipline face are highlighted. Each discipline is concluded by discussing the opportunities and bottlenecks in addressing safety. Commonalities and differences between the engineering disciplines are investigated, specifically on the design strategies for which empirical data have been collected. We argue that Safe-by-Design is best considered as a specific elaboration of Responsible Research and Innovation, with an explicit focus on safety in relation to other important values in engineering such as well-being, sustainability, equity, and affordability. Safe-by-Design provides for an intellectual venue where social science and the humanities (SSH) collaborate on technological developments and innovation by helping to proactively incorporate safety considerations into engineering practices, while navigating between the extremes of technological optimism and disproportionate precaution. As such, Safe-by-Design is also a practical tool for policymakers and risk assessors that helps shape governance arrangements for accommodating and incentivizing safety, while fully acknowledging uncertainty.

show abstract

Flexible Heuristic Dynamic Programming for Reinforcement Learning in Quad-Rotors

Cited by 3 publications

References 35 publications

Safe Curriculum Learning for Optimal Flight Control of Unmanned Aerial Vehicles with Uncertain System Dynamics

Safe Curriculum Learning for Optimal Flight Control of Unmanned Aerial Vehicles with Uncertain System Dynamics

Safe Curriculum Learning for Linear Systems with Parametric Unknowns in Primary Flight Control

Safe-by-Design in Engineering: An Overview and Comparative Analysis of Engineering Disciplines

Contact Info

Product

Resources

About