Sergey Tsapko scite author profile

This paper presents application of reinforcement learning (RL) in development of automated control systems. This method WBS successfully applied in development of control system that controls a pendulum.Advantages and disadvantages of RL control systems are described.Development of automated control system using classical methods of automated control theory supposes exploration of plant properties that influence on structure of control system and control algorithms. The difficulty of this task increases w i t h increasing o f the plant complexity. If the piant has nonlinear properties and they change over time, the difficulty of this task increases significantly. Application of RL in control system development allows to transfer basic attention f?om exploration of plant properties to development of universal control system that is capable to adapt to plant properties and guarantee necessary control quality., RL is unsupervised machine learning method. RL control system is capable to provide control signat that is close to optimal value by trial-and-emar exploration. In contrast to supervised learning, when control system is provided by optimal value of control signal every moment, €U control system isn't provided by such information. Instead of optimal value of control signal RL control system is provided by scalar reward signal that is an estimation of how good system's outputs are. RL methods are developed for discrete interaction of controller with plant. RL control system is shown in the fig. 1. Controller and plant interact with each other in discrete time steps i = O,l,Z, ... Every time step i controller receives information about current plant state si E S , where S is a set of possible plant states, and according to this information it provides some control signal ai E A ( s i ) , where A(sj) is a set of possible control signals, that controller can provide if plant state is si. Also every time step controller receives scalar reward signal r. The goal of F U control system is maximizing of total reward signal R that is calculated from expression: . -I Fig. 1. RL control system. In order to achieve this goal controller defines a policy n and a value function V. The policy 7c is a mapping of set of plant states S onto set of control signals A. The policy defines control signal ai that it is necessary to provide if plant state is si. The value of state si is a cumulative reward obtained by the controtler fiom plant state si. The value of state si is defined by expression [I 1: V(.q)=q+\ +y.r;.,, +y2'q*3+...=?j+1 + y . V ( s , + , ) . (2) Every time step the value of state si is updated according to received reward signal ri+j. Adjustment of the state value is calculated according to expression: V(si> = v(sj)+u.(3+1 +~* J ' (~j +~) -~(~j ) )~ (3) where CI E [0,1] is a teaming parameter [2]. preformed according to the following algorithm [3]: Calculation of the value function V and the policy xis I ) define arbitrary policy KO; 2) i=O; 3) receive signals si, ri, provide control signa1 ui=q(sJ; 4) adjust the s...

show abstract

E-network modelling of process industrial control systems in building computer simulators

Braginsky

Тараканов

Tsapko

2016

View full text Add to dashboard Cite

An Open Source GPU Accelerated Framework for Flexible Algebraic Reconstruction at Synchrotron Light Sources

Shkarin

Ametova²,

Chilingaryan

et al. 2015

View full text Add to dashboard Cite

Principles of E-network modelling of heterogeneous systems

Тараканов

Tsapko

et al. 2016

IOP Conf. Ser.: Mater. Sci. Eng.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sergey Tsapko

GPU-optimized Direct Fourier Method for On-line Tomography

Application of reinforcement learning in control system development

E-network modelling of process industrial control systems in building computer simulators

An Open Source GPU Accelerated Framework for Flexible Algebraic Reconstruction at Synchrotron Light Sources

Principles of E-network modelling of heterogeneous systems

Contact Info

Product

Resources

About