2005
DOI: 10.1007/11564096_32
|View full text |Cite
|
Sign up to set email alerts
|

Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method

Abstract: Abstract. This paper introduces NFQ, an algorithm for efficient and effective training of a Q-value function represented by a multi-layer perceptron. Based on the principle of storing and reusing transition experiences, a model-free, neural network based Reinforcement Learning algorithm is proposed. The method is evaluated on three benchmark problems. It is shown empirically, that reasonably few interactions with the plant are needed to generate control policies of high quality.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
561
1
7

Year Published

2007
2007
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 640 publications
(573 citation statements)
references
References 7 publications
4
561
1
7
Order By: Relevance
“…FQI uses a batch-trained function approximator (FA) as action-value function. Various types of non-linear function approximators have been successfully used with FQI, e.g., Neural Networks [12], Gaussian Processes [2], and others [9]. In this paper, we will use Locally Weighted Projection Regression (LWPR) [15] as the value function approximator of choice, as it is a fast robust online method that can handle large amounts of data.…”
Section: Solving the Pomdpmentioning
confidence: 99%
“…FQI uses a batch-trained function approximator (FA) as action-value function. Various types of non-linear function approximators have been successfully used with FQI, e.g., Neural Networks [12], Gaussian Processes [2], and others [9]. In this paper, we will use Locally Weighted Projection Regression (LWPR) [15] as the value function approximator of choice, as it is a fast robust online method that can handle large amounts of data.…”
Section: Solving the Pomdpmentioning
confidence: 99%
“…As LAWER is a Fitted Q-iteration (FQI) (Ernst et al, 2005;Riedmiller, 2005) based algorithm we quickly review the relevant concepts. FQI is a batch mode reinforcement learning (BMRL) algorithm.…”
Section: Fitted Q-iterationmentioning
confidence: 99%
“…The update to the critic FA is simply equation (3). Denoting the output of the actor FA at time t as Ac t (s t ) and its parameter vector as θ Ac , the update to the parameters of the actor is:…”
Section: A Continuous Actor Critic Learning Automatonmentioning
confidence: 99%
“…This paper will only discuss online algorithms and will therefore not cover batch algorithms for similar problems. This automatically excludes batch algorithms such as Episodic Natural Actor Critic [2] and Neural Fitted Q Iteration [3]. Since CACLA is easily extended to a batch algorithm, in the future it may be interesting to compare a set of batch algorithms including the aforementioned ones to the batch version of CACLA.…”
Section: Introductionmentioning
confidence: 99%