2014 International Joint Conference on Neural Networks (IJCNN) 2014
DOI: 10.1109/ijcnn.2014.6889733
|View full text |Cite
|
Sign up to set email alerts
|

Approximate model-assisted Neural Fitted Q-Iteration

Abstract: Abstract-In this work, we propose an extension to the Neural Fitted Q-Iteration algorithm that utilizes a learned model to generate virtual trajectories which are used for updating the Q-function. Compared to standard NFQ, this combination has the potential to greatly reduce the amount of system interaction required to learn a good policy. At the same time, the approach still maintains the generalization ability of Q-learning. We provide a general formulation for approximate model-assisted fitted Q-learning, a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
20
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(20 citation statements)
references
References 9 publications
(12 reference statements)
0
20
0
Order By: Relevance
“…This domain knowledge can be incorporated through e.g. information regarding the shape of the policy [40] or through using a model similar as in [53]. A second point of…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…This domain knowledge can be incorporated through e.g. information regarding the shape of the policy [40] or through using a model similar as in [53]. A second point of…”
Section: Discussionmentioning
confidence: 99%
“…In [42] the authors conclude that especially for non-linear control problems, FQI can be a valuable alternative to MPC approaches with the extra advantage that FQI is a blind technique. Moreover, FQI and MPC can strengthen each other [42,53]. Although several BRL techniques have been proposed in the literature [36,52,44], this work focuses on FQI using extremely randomized trees as regression algorithm [47].…”
Section: Step 2: Batch Reinforcement Learningmentioning
confidence: 99%
“…For example, the authors of [29] present a model-based policy search method that learns a Gaussian process to model uncertainties. In addition, inspired by [30], the authors of [31] demonstrate how a model-assisted batch RL technique can be applied to control a building heating system.…”
Section: Reinforcement Learningmentioning
confidence: 99%
“…Several different MBRL approaches have been developed in the literature over the last few decades. Imaginary roll-outs, i.e., the use of a model as a proxy for the real world to evaluate temporal difference errors (referred to as Bellman errors (BEs) in this paper) are explored in results such as [22] and [23]. While the sample efficiency is of the policy learning algorithms is improved, the performance of the method in [22] decays rapidly with model mismatch, and the method in [23] relies on fitting neural networks to dynamics, which is typically data-intensive, nullifying the sample efficiency gain in the policy learning algorithm.…”
Section: Introductionmentioning
confidence: 99%