Abstract:This is the accepted version of the paper.This version of the publication may differ from the final published version.
Permanent repository link
AbstractWe present a recurrent neural-network (RNN) controller designed to solve the tracking problem for control systems. We demonstrate that a major difficulty in training any RNN is the problem of exploding gradients, and we propose a solution to this in the case of tracking problems, by introducing a stabilization matrix and by using carefully constrained context… Show more
“…where u d is given by (4) and u * e is given by (9). Remark 2: The feedback part of the control input (9) is designed to stabilize the tracking error dynamics.…”
Section: Problem Formulation and Its Standard Solutionmentioning
confidence: 99%
“…One can refer to [9], [45], and [46] for an exact gradient descent algorithm with improved convergence guarantees.…”
Section: Learning Rules For Actor and Critic Nnsmentioning
confidence: 99%
“…Several techniques have been proposed to approximate the HJB solution. Included are reinforcement learning (RL) [1]- [8] and backpropagation through time [9]. RL techniques have been successfully applied to find the solution to the HJB equation online in real time for unknown or partially unknown continuous-time (CT) systems [10]- [12] and discrete-time (DT) systems [13]- [17].…”
This paper presents a partially model-free adaptive optimal control solution to the deterministic nonlinear discrete-time (DT) tracking control problem in the presence of input constraints. The tracking error dynamics and reference trajectory dynamics are first combined to form an augmented system. Then, a new discounted performance function based on the augmented system is presented for the optimal nonlinear tracking problem. In contrast to the standard solution, which finds the feedforward and feedback terms of the control input separately, the minimization of the proposed discounted performance function gives both feedback and feedforward parts of the control input simultaneously. This enables us to encode the input constraints into the optimization problem using a nonquadratic performance function. The DT tracking Bellman equation and tracking Hamilton-Jacobi-Bellman (HJB) are derived. An actor-critic-based reinforcement learning algorithm is used to learn the solution to the tracking HJB equation online without requiring knowledge of the system drift dynamics. That is, two neural networks (NNs), namely, actor NN and critic NN, are tuned online and simultaneously to generate the optimal bounded control policy. A simulation example is given to show the effectiveness of the proposed method.
“…where u d is given by (4) and u * e is given by (9). Remark 2: The feedback part of the control input (9) is designed to stabilize the tracking error dynamics.…”
Section: Problem Formulation and Its Standard Solutionmentioning
confidence: 99%
“…One can refer to [9], [45], and [46] for an exact gradient descent algorithm with improved convergence guarantees.…”
Section: Learning Rules For Actor and Critic Nnsmentioning
confidence: 99%
“…Several techniques have been proposed to approximate the HJB solution. Included are reinforcement learning (RL) [1]- [8] and backpropagation through time [9]. RL techniques have been successfully applied to find the solution to the HJB equation online in real time for unknown or partially unknown continuous-time (CT) systems [10]- [12] and discrete-time (DT) systems [13]- [17].…”
This paper presents a partially model-free adaptive optimal control solution to the deterministic nonlinear discrete-time (DT) tracking control problem in the presence of input constraints. The tracking error dynamics and reference trajectory dynamics are first combined to form an augmented system. Then, a new discounted performance function based on the augmented system is presented for the optimal nonlinear tracking problem. In contrast to the standard solution, which finds the feedforward and feedback terms of the control input separately, the minimization of the proposed discounted performance function gives both feedback and feedforward parts of the control input simultaneously. This enables us to encode the input constraints into the optimization problem using a nonquadratic performance function. The DT tracking Bellman equation and tracking Hamilton-Jacobi-Bellman (HJB) are derived. An actor-critic-based reinforcement learning algorithm is used to learn the solution to the tracking HJB equation online without requiring knowledge of the system drift dynamics. That is, two neural networks (NNs), namely, actor NN and critic NN, are tuned online and simultaneously to generate the optimal bounded control policy. A simulation example is given to show the effectiveness of the proposed method.
“…The control strategy of a GFC is crucial to maintain the power quality and the security of the GFC system (Fairbank, Li, Fu, Alonso, & Wunsch, 2014). Both subsystems of a GFC may exhibit certain degrees of uncertainty and volatility.…”
Section: Introductionmentioning
confidence: 99%
“…In Fairbank et al (2014), Fu, Li, and Jaithwa (2015), Li et al (2014), recurrent neural networks (RNNs), as an intelligence control method, have been used to control GFC systems in which the RNN is trained by the back propagation through time (BPTT) algorithm. However, the training process is complex.…”
Three-phase grid-feeding converters are key components to integrate distributed generation and renewable power sources to the power utility. Conventionally, proportional integral and proportional resonantbased control strategies are applied to control the output power or current of a GFC. But, those control strategies have poor transient performance and are not robust against uncertainties and volatilities in the system. This paper proposes a H 2 /H ∞ -based control strategy, which can mitigate the above restrictions. The uncertainty and disturbance are included to formulate the GFC system state-space model, making it more accurate to reflect the practical system conditions. The paper uses a convex optimisation method to design the H 2 /H ∞ -based optimal controller. Instead of using a guess-and-check method, the paper uses particle swarm optimisation to search a H 2 /H ∞ optimal controller. Several case studies implemented by both simulation and experiment can verify the superiority of the proposed control strategy than the traditional PI control methods especially under dynamic and variable system conditions. ARTICLE HISTORY
Since last 40 years, the theory and technology of model predictive control (MPC) have been developed rapidly. However, nonlinear MPC still faces difficulties such as high online computational complexity and inability to accurately model the system. In order to improve or solve the problems mentioned above of MPC, recent researches have deepened the learning‐based control. The learned method can model unknown or highly uncertain nonlinearities. And the emergence of efficient algorithms has greatly improved the feasibility of computing. Stability is at the heart of control design. Learning‐based nonlinear model predictive control (LB‐NMPC) has achieved systematic research results in the past 10 years. But the stability of LB‐NMPC is still an open question that has not been fully addressed in the literature. This review mainly summarizes the latest research progress of LB‐NMPC. More specifically, the uncertainty and online optimization problems of the considered systems are investigated mainly focusing on the use of learning techniques. At the same time, the research hotspots such as the control stability and constraint satisfaction of LB‐NMPC are briefly discussed. Finally, the application of LB‐NMPC technology in integrated circuits, path tracking control, and other fields is reviewed, which provides a reference for the research and application of LB‐NMPC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.