2019 IEEE 58th Conference on Decision and Control (CDC) 2019
DOI: 10.1109/cdc40024.2019.9030143
|View full text |Cite
|
Sign up to set email alerts
|

Q-learning for POMDP: An application to learning locomotion gaits

Abstract: This paper presents a Q-learning framework for learning optimal locomotion gaits in robotic systems modeled as coupled rigid bodies. Inspired by prevalence of periodic gaits in bio-locomotion, an open loop periodic input is assumed to (say) affect a nominal gait. The learning problem is to learn a new (modified) gait by using only partial noisy measurements of the state. The objective of learning is to maximize a given reward modeled as an objective function in optimal control settings. The proposed control ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
4
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 26 publications
0
4
0
Order By: Relevance
“…Few works consider the more complex Partially Observable Markov Decision Process (POMDP) where the observation is just a partial representation of the underlying state. However, POMDP is ubiquitous in real robotics applications [16], [17], such as robot navigation [18], [19], robotic manipulation [20], autonomous driving [21], [22], [23], and planning under uncertainty [24], [25], [26], [27]. Partial observability may be due to limited sensing capability, or an incomplete system model resulting in uncertainty about full observability.…”
Section: Introductionmentioning
confidence: 99%
“…Few works consider the more complex Partially Observable Markov Decision Process (POMDP) where the observation is just a partial representation of the underlying state. However, POMDP is ubiquitous in real robotics applications [16], [17], such as robot navigation [18], [19], robotic manipulation [20], autonomous driving [21], [22], [23], and planning under uncertainty [24], [25], [26], [27]. Partial observability may be due to limited sensing capability, or an incomplete system model resulting in uncertainty about full observability.…”
Section: Introductionmentioning
confidence: 99%
“…This overall control system can be viewed as a central pattern generator (CPG) which integrates sensory information to learn closed-loop optimal control policies for biolocomotion. The framework presented here is based upon our prior research in [10] where phase reduction technique was introduced for a 2-link system and in [13] where the technique was extended to include learning for the 2-link system. The main contributions of this work over and above these prior publications are as follows:…”
Section: Introductionmentioning
confidence: 99%
“…1) The application involving the snake robot is new and more practically motivated than the simple 2-link model considered in [13].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation