Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method

Riedmiller, Martin

doi:10.1007/11564096_32

Cited by 640 publications

(573 citation statements)

References 7 publications

Supporting

Mentioning

561

Contrasting

Unclassified

Order By: Relevance

“…FQI uses a batch-trained function approximator (FA) as action-value function. Various types of non-linear function approximators have been successfully used with FQI, e.g., Neural Networks [12], Gaussian Processes [2], and others [9]. In this paper, we will use Locally Weighted Projection Regression (LWPR) [15] as the value function approximator of choice, as it is a fast robust online method that can handle large amounts of data.…”

Section: Solving the Pomdpmentioning

confidence: 99%

Sequential Feature Selection for Classification

Rückstieß

Osendorfer

Smagt

2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. In most real-world information processing problems, data is not a free resource; its acquisition is rather time-consuming and/or expensive. We investigate how these two factors can be included in supervised classification tasks by deriving classification as a sequential decision process and making it accessible to Reinforcement Learning. Our method performs a sequential feature selection that learns which features are most informative at each timestep, choosing the next feature depending on the already selected features and the internal belief of the classifier. Experiments on a handwritten digits classification task show significant reduction in required data for correct classification, while a medical diabetes prediction task illustrates variable feature cost minimization as a further property of our algorithm.

show abstract

Section: Solving the Pomdpmentioning

confidence: 99%

Sequential Feature Selection for Classification

Rückstieß

Osendorfer

Smagt

2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…As LAWER is a Fitted Q-iteration (FQI) (Ernst et al, 2005;Riedmiller, 2005) based algorithm we quickly review the relevant concepts. FQI is a batch mode reinforcement learning (BMRL) algorithm.…”

Section: Fitted Q-iterationmentioning

confidence: 99%

Learning complex motions by sequencing simpler motion templates

Neumann

Maass

Peters

2009

Proceedings of the 26th Annual International Conference on Machine Learning

View full text Add to dashboard Cite

Abstraction of complex, longer motor tasks into simpler elemental movements enables humans and animals to exhibit motor skills which have not yet been matched by robots. Humans intuitively decompose complex motions into smaller, simpler segments. For example when describing simple movements like drawing a triangle with a pen, we can easily name the basic steps of this movement. Surprisingly, such abstractions have rarely been used in artificial motor skill learning algorithms. These algorithms typically choose a new action (such as a torque or a force) at a very fast time-scale. As a result, both policy and temporal credit assignment problem become unnecessarily complex -often beyond the reach of current machine learning methods.We introduce a new framework for temporal abstractions in reinforcement learning (RL), i.e. RL with motion templates. We present a new algorithm for this framework which can learn high-quality policies by making only few abstract decisions.

show abstract

“…The update to the critic FA is simply equation (3). Denoting the output of the actor FA at time t as Ac t (s t ) and its parameter vector as θ Ac , the update to the parameters of the actor is:…”

Section: A Continuous Actor Critic Learning Automatonmentioning

confidence: 99%

“…This paper will only discuss online algorithms and will therefore not cover batch algorithms for similar problems. This automatically excludes batch algorithms such as Episodic Natural Actor Critic [2] and Neural Fitted Q Iteration [3]. Since CACLA is easily extended to a batch algorithm, in the future it may be interesting to compare a set of batch algorithms including the aforementioned ones to the batch version of CACLA.…”

Section: Introductionmentioning

confidence: 99%

Reinforcement Learning in Continuous Action Spaces

Hasselt

Wiering

2007

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

164

108

View full text Add to dashboard Cite

Abstract-Quite some research has been done on Reinforcement Learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named Continuous Actor Critic Learning Automaton (CACLA) that can handle continuous states and actions. The resulting algorithm is straightforward to implement. An experimental comparison is made between this algorithm and other algorithms that can handle continuous action spaces. These experiments show that CACLA performs much better than the other algorithms, especially when it is combined with a Gaussian exploration method.

show abstract

Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method

Cited by 640 publications

References 7 publications

Sequential Feature Selection for Classification

Sequential Feature Selection for Classification

Learning complex motions by sequencing simpler motion templates

Reinforcement Learning in Continuous Action Spaces

Contact Info

Product

Resources

About