1999
DOI: 10.1007/3-540-46695-9_35
|View full text |Cite
|
Sign up to set email alerts
|

Q-Learning in Continuous State and Action Spaces

Abstract: Abstract. Q-learning can be used to learn a control policy that maximises a scalar reward through interaction with the environment. Qlearning is commonly applied to problems with discrete states and actions. We describe a method suitable for control tasks which require continuous actions, in response to continuous states. The system consists of a neural network coupled with a novel interpolator. Simulation results are presented for a non-holonomic control task. Advantage Learning, a variation of Q-learning, is… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
47
0
2

Year Published

2000
2000
2017
2017

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 98 publications
(54 citation statements)
references
References 11 publications
0
47
0
2
Order By: Relevance
“…Note that the agent is faced with the delayed-reward problem, and that it must take the distance to the two exits into consideration for choosing the most attractive one. The maze has a ground carpeted with a color image of 1280 × 1280 pixels, that is a montage of pictures from the COIL-100 database 5 . The agent does not have direct access to its (x, y) position in the maze.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Note that the agent is faced with the delayed-reward problem, and that it must take the distance to the two exits into consideration for choosing the most attractive one. The maze has a ground carpeted with a color image of 1280 × 1280 pixels, that is a montage of pictures from the COIL-100 database 5 . The agent does not have direct access to its (x, y) position in the maze.…”
Section: Resultsmentioning
confidence: 99%
“…Furthermore, an a priori discretization of the action space generally suffers from an explosion of the representational size of the domains known as the curse of dimensionality, and may introduce artificial noise. Previously-investigated solutions for handling continuous actions without a priori discretization generally use function approximators such as neural networks [3], tile coding [4], or wire fitting [5]. However, to the best of our knowledge, none of these methods can cope simultaneously with high-dimensional, discrete perceptual spaces.…”
Section: Introductionmentioning
confidence: 99%
“…Several Reinforcement Learning (RL) algorithms have addressed the problem of learning to perform well in a continuous environment that is not perfectly modeled. Model-free RL approaches, such as Q-Learning [6] and policy gradient descent [7], are capable of improving robot performance without explicitly modeling the world. While this generality is appealing and necessary in situations where modeling is impractical, learning tends to be less data-efficient and is not generalizable to different tasks within the same environment [8].…”
Section: Related Workmentioning
confidence: 99%
“…How best to learn multiple modes of behavior is an interesting open challenge, the impact of which is magnified by other challenges common in domains requiring intelligent behavior: partial observability (Sutton and Barto, 1998), continuous state and action spaces (Gaskett et al, 1999), and noisy evaluations. Taking on these challenges, this dissertation develops methods specifically aimed at discovering multimodal behavior.…”
Section: Challengementioning
confidence: 99%