Q-learning for POMDP: An application to learning locomotion gaits

Wang, Tixian; Taghvaei, Amirhossein; Mehta, Prashant G.

doi:10.1109/cdc40024.2019.9030143

Cited by 4 publications

(4 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Few works consider the more complex Partially Observable Markov Decision Process (POMDP) where the observation is just a partial representation of the underlying state. However, POMDP is ubiquitous in real robotics applications [16], [17], such as robot navigation [18], [19], robotic manipulation [20], autonomous driving [21], [22], [23], and planning under uncertainty [24], [25], [26], [27]. Partial observability may be due to limited sensing capability, or an incomplete system model resulting in uncertainty about full observability.…”

Section: Introductionmentioning

confidence: 99%

Memory-based Deep Reinforcement Learning for POMDPs

Meng¹,

Gorbet²,

Kulić³

2021

Preprint

View full text Add to dashboard Cite

A promising characteristic of Deep Reinforcement Learning (DRL) is its capability to learn optimal policy in an end-to-end manner without relying on feature engineering. However, most approaches assume a fully observable state space, i.e. fully observable Markov Decision Process (MDP). In real-world robotics, this assumption is unpractical, because of the sensor issues such as sensors' capacity limitation and sensor noise, and the lack of knowledge about if the observation design is complete or not. These scenarios lead to Partially Observable MDP (POMDP) and need special treatment. In this paper, we propose Long-Short-Term-Memory-based Twin Delayed Deep Deterministic Policy Gradient (LSTM-TD3) by introducing a memory component to TD3, and compare its performance with other DRL algorithms in both MDPs and POMDPs. Our results demonstrate the significant advantages of the memory component in addressing POMDPs, including the ability to handle missing and noisy observation data.

show abstract

Section: Introductionmentioning

confidence: 99%

Memory-based Deep Reinforcement Learning for POMDPs

Meng¹,

Gorbet²,

Kulić³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…This overall control system can be viewed as a central pattern generator (CPG) which integrates sensory information to learn closed-loop optimal control policies for biolocomotion. The framework presented here is based upon our prior research in [10] where phase reduction technique was introduced for a 2-link system and in [13] where the technique was extended to include learning for the 2-link system. The main contributions of this work over and above these prior publications are as follows:…”

Section: Introductionmentioning

confidence: 99%

“…1) The application involving the snake robot is new and more practically motivated than the simple 2-link model considered in [13].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Bio-inspired Learning of Sensorimotor Control for Locomotion

Wang

Taghvaei

Mehta

2019

Preprint

Self Cite

View full text Add to dashboard Cite

This paper presents a bio-inspired central pattern generator (CPG)-type architecture for learning optimal maneuvering control of periodic locomotory gaits. The architecture is presented here with the aid of a snake robot model problem involving planar locomotion of coupled rigid body systems. The maneuver involves clockwise or counterclockwise turning from a nominally straight path. The CPG circuit is realized as a coupled oscillator feedback particle filter. The collective dynamics of the filter are used to approximate a posterior distribution that is used to construct the optimal control input for maneuvering the robot. A Q-learning algorithm is applied to learn the approximate optimal control law. The issues surrounding the parametrization of the Q-function are discussed. The theoretical results are illustrated with numerics for a 5-link snake robot system.

show abstract