2020
DOI: 10.33448/rsd-v9i2.2128
|View full text |Cite
|
Sign up to set email alerts
|

Tuning heuristics and convergence analysis of reinforcement learning algorithm for online data-based optimal control design

Abstract: A heuristic for tuning and convergence analysis of the reinforcement learning algorithm for control with output feedback with only input / output data generated by a model is presented. To promote convergence analysis, it is necessary to perform the parameter adjustment in the algorithms used for data generation, and iteratively solve the control problem. A heuristic is proposed to adjust the data generator parameters creating surfaces to assist in the convergence and robustness analysis process of the optimal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0
1

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 31 publications
0
2
0
1
Order By: Relevance
“…Os modelos matemáticos, utilizados pela PO, constituem uma abstração simplificada da realidade, representada por um conjunto de ações e reações, a partir da qual o modelo utiliza símbolos matemáticos para representar as variáveis de decisão do sistema real (Silva & Neto, 2020). A exatidão e captura dos aspectos essenciais dessa realidade estão associados diretamente a qualidade do modelo matemático construído (Caixeta-Filho, 2001;Battesini et al, 2018).…”
Section: Pesquisa Operacional E a Modelagem De Problemasunclassified
“…Os modelos matemáticos, utilizados pela PO, constituem uma abstração simplificada da realidade, representada por um conjunto de ações e reações, a partir da qual o modelo utiliza símbolos matemáticos para representar as variáveis de decisão do sistema real (Silva & Neto, 2020). A exatidão e captura dos aspectos essenciais dessa realidade estão associados diretamente a qualidade do modelo matemático construído (Caixeta-Filho, 2001;Battesini et al, 2018).…”
Section: Pesquisa Operacional E a Modelagem De Problemasunclassified
“…The presented value iteration algorithm can be solved online using standard methods, such as the least squares or recursive least squares (RLS). In [37], the RLS method was applied in the Q-learning algorithm to solve Eq. (26) in the following form:…”
Section: Implementation Noise and Discount Factormentioning
confidence: 99%
“…The motivation behind the proposed state reconstruction is that it is not directly or explicitly based on mathematical models for output feedback [34] and [35], which include methods for the on-line design of optimal controllers based on ADP [36]. As our proposed reconstruction is datadependent on the measured signal, it is referred to as a "data-driven approach for state reconstruction" that uses RL to approximate the value function of the Hamilton-Jacob-Bellman (HJB) equation [37].…”
Section: Introductionmentioning
confidence: 99%