Q-Learning in Continuous State and Action Spaces

Gaskett, Chris; Wettergreen, David; Zelinsky, Alexander

doi:10.1007/3-540-46695-9_35

Cited by 98 publications

(54 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Note that the agent is faced with the delayed-reward problem, and that it must take the distance to the two exits into consideration for choosing the most attractive one. The maze has a ground carpeted with a color image of 1280 × 1280 pixels, that is a montage of pictures from the COIL-100 database 5 . The agent does not have direct access to its (x, y) position in the maze.…”

Section: Resultsmentioning

confidence: 99%

“…Furthermore, an a priori discretization of the action space generally suffers from an explosion of the representational size of the domains known as the curse of dimensionality, and may introduce artificial noise. Previously-investigated solutions for handling continuous actions without a priori discretization generally use function approximators such as neural networks [3], tile coding [4], or wire fitting [5]. However, to the best of our knowledge, none of these methods can cope simultaneously with high-dimensional, discrete perceptual spaces.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Task-Driven Discretization of the Joint Space of Visual Percepts and Continuous Actions

Jodogne

Piater

2006

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We target the problem of closed-loop learning of control policies that map visual percepts to continuous actions. Our algorithm, called Reinforcement Learning of Joint Classes (RLJC), adaptively discretizes the joint space of visual percepts and continuous actions. In a sequence of attempts to remove perceptual aliasing, it incrementally builds a decision tree that applies tests either in the input perceptual space or in the output action space. The leaves of such a decision tree induce a piecewise constant, optimal state-action value function, which is computed through a reinforcement learning algorithm that uses the tree as a function approximator. The optimal policy is then derived by selecting the action that, given a percept, leads to the leaf that maximizes the value function. Our approach is quite general and applies also to learning mappings from continuous percepts to continuous actions. A simulated visual navigation problem illustrates the applicability of RLJC.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Task-Driven Discretization of the Joint Space of Visual Percepts and Continuous Actions

Jodogne

Piater

2006

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Several Reinforcement Learning (RL) algorithms have addressed the problem of learning to perform well in a continuous environment that is not perfectly modeled. Model-free RL approaches, such as Q-Learning [6] and policy gradient descent [7], are capable of improving robot performance without explicitly modeling the world. While this generality is appealing and necessary in situations where modeling is impractical, learning tends to be less data-efficient and is not generalizable to different tasks within the same environment [8].…”

Section: Related Workmentioning

confidence: 99%

Plan execution monitoring through detection of unmet expectations about action outcomes

Mendoza

Veloso

Simmons

2015

2015 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

Abstract-Modeling the effects of actions based on the state of the world enables robots to make intelligent decisions in different situations. However, it is often infeasible to have globally accurate models. Task performance is often hindered by discrepancies between models and the real world, since the true outcome of executing a plan may be significantly worse than the expected outcome used during planning. Furthermore, expectations about the world are often stochastic in robotics, making the discovery of model-world discrepancies non-trivial. We present an execution monitoring framework capable of finding statistically significant discrepancies, determining the situations in which they occur, and making simple corrections to the world model to improve performance. In our approach, plans are initially based on a model of the world that is only as faithful as computational and algorithmic limitations allow. Through experience, the monitor discovers previously unmodeled modes of the world, defined as regions of a feature space in which the experienced outcome of a plan deviates significantly from the predicted outcome. The monitor may then make suggestions to change the model to match the real world more accurately. We demonstrate this approach on the adversarial domain of robot soccer: we monitor pass interception performance of potentially unknown opponents to try to find unforeseen modes of behavior that affect their interception performance.

show abstract

“…How best to learn multiple modes of behavior is an interesting open challenge, the impact of which is magnified by other challenges common in domains requiring intelligent behavior: partial observability (Sutton and Barto, 1998), continuous state and action spaces (Gaskett et al, 1999), and noisy evaluations. Taking on these challenges, this dissertation develops methods specifically aimed at discovering multimodal behavior.…”

Section: Challengementioning

confidence: 99%

Evolving multimodal behavior through modular multiobjective neuroevolution

Schrum

2015

SIGEVOlution

View full text Add to dashboard Cite

Intelligent organisms do not simply perform one task, but exhibit multiple distinct modes of behavior. For instance, humans can swim, climb, write, solve problems, and play sports. To be fully autonomous and robust, it would be advantageous for artificial agents, both in physical and virtual worlds, to exhibit a similar diversity of behaviors. Artificial evolution, in particular neuroevolution [3, 4], is known to be capable of discovering complex agent behavior. This dissertation expands on existing neuroevolution methods, specifically NEAT (Neuro-Evolution of Augmenting Topologies [7]), to make the discovery of multiple modes of behavior possible. More specifically, it proposes four extensions: (1) multiobjective evolution, (2) sensors that are split up according to context, (3) modular neural network structures, and (4) fitness-based shaping. All of these technical contributions are incorporated into the software framework of Modular Multiobjective NEAT (MM-NEAT), which can be downloaded here.

show abstract

Q-Learning in Continuous State and Action Spaces

Cited by 98 publications

References 11 publications

Task-Driven Discretization of the Joint Space of Visual Percepts and Continuous Actions

Task-Driven Discretization of the Joint Space of Visual Percepts and Continuous Actions

Plan execution monitoring through detection of unmet expectations about action outcomes

Evolving multimodal behavior through modular multiobjective neuroevolution

Contact Info

Product

Resources

About