This paper presents application of reinforcement learning (RL) in development of automated control systems. This method WBS successfully applied in development of control system that controls a pendulum.Advantages and disadvantages of RL control systems are described.Development of automated control system using classical methods of automated control theory supposes exploration of plant properties that influence on structure of control system and control algorithms. The difficulty of this task increases w i t h increasing o f the plant complexity. If the piant has nonlinear properties and they change over time, the difficulty of this task increases significantly. Application of RL in control system development allows to transfer basic attention f?om exploration of plant properties to development of universal control system that is capable to adapt to plant properties and guarantee necessary control quality., RL is unsupervised machine learning method. RL control system is capable to provide control signat that is close to optimal value by trial-and-emar exploration. In contrast to supervised learning, when control system is provided by optimal value of control signal every moment, €U control system isn't provided by such information. Instead of optimal value of control signal RL control system is provided by scalar reward signal that is an estimation of how good system's outputs are. RL methods are developed for discrete interaction of controller with plant. RL control system is shown in the fig. 1. Controller and plant interact with each other in discrete time steps i = O,l,Z, ... Every time step i controller receives information about current plant state si E S , where S is a set of possible plant states, and according to this information it provides some control signal ai E A ( s i ) , where A(sj) is a set of possible control signals, that controller can provide if plant state is si. Also every time step controller receives scalar reward signal r. The goal of F U control system is maximizing of total reward signal R that is calculated from expression: . -I Fig. 1. RL control system. In order to achieve this goal controller defines a policy n and a value function V. The policy 7c is a mapping of set of plant states S onto set of control signals A. The policy defines control signal ai that it is necessary to provide if plant state is si. The value of state si is a cumulative reward obtained by the controtler fiom plant state si. The value of state si is defined by expression [I 1: V(.q)=q+\ +y.r;.,, +y2'q*3+...=?j+1 + y . V ( s , + , ) . (2) Every time step the value of state si is updated according to received reward signal ri+j. Adjustment of the state value is calculated according to expression: V(si> = v(sj)+u.(3+1 +~* J ' (~j +~) -~(~j ) )~ (3) where CI E [0,1] is a teaming parameter [2]. preformed according to the following algorithm [3]: Calculation of the value function V and the policy xis I ) define arbitrary policy KO; 2) i=O; 3) receive signals si, ri, provide control signa1 ui=q(sJ; 4) adjust the s...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.