2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2021
DOI: 10.1109/iros51168.2021.9636140
|View full text |Cite
|
Sign up to set email alerts
|

Memory-based Deep Reinforcement Learning for POMDPs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 47 publications
(52 citation statements)
references
References 18 publications
0
46
0
Order By: Relevance
“…We conducted experiments on the Pendulum control problem, where the pendulum is learned to swing up and stay upright during every episode. Unlike the original fully-observable version, each dimension of every state is converted to zero with probability p miss = 0.1 when the agent receives observation (Meng, Gorbet, and Kulic 2021). This random missing setting induces partial observability and makes a simple control problem challenging.…”
Section: Pendulum -Random Missing Versionmentioning
confidence: 99%
See 1 more Smart Citation
“…We conducted experiments on the Pendulum control problem, where the pendulum is learned to swing up and stay upright during every episode. Unlike the original fully-observable version, each dimension of every state is converted to zero with probability p miss = 0.1 when the agent receives observation (Meng, Gorbet, and Kulic 2021). This random missing setting induces partial observability and makes a simple control problem challenging.…”
Section: Pendulum -Random Missing Versionmentioning
confidence: 99%
“…The information from the past should be extracted and exploited during the learning phase to compensate for the information loss due to partial observability. Partially observable situations are prevalent in real-world problems such as control tasks when observations are noisy, some part of the underlying state information is deleted, or long-term information needs to be estimated (Han, Doya, and Tani 2020b;Meng, Gorbet, and Kulic 2021).…”
Section: Introductionmentioning
confidence: 99%
“…Hausknecht & Stone (2015) extended the Deep Q-Network (DQN) algorithm and examined the way how episodes and their hidden states are sampled from the experience replay buffer. More literature is available on adding recurrence to variants of Deep Deterministic Policy Gradient (DDPG) algorithms (Heess et al, 2015;Meng et al, 2021). Especially Ni et al (2021) proposed a recurrent Twin-Delayed DDPG implementation and shared how their replay buffer is made more memory-efficient.…”
Section: Applications Of Recurrent Layers In Drlmentioning
confidence: 99%
“…We conducted experiments on the Pendulum control problem (Brockman et al 2016), where the pendulum is learned to swing up and stay upright during every episode. Unlike the original fully-observable version, each dimension of every state is converted to zero with probability p miss = 0.1 when the agent receives observation (Meng, Gorbet, and Kulic 2021). This random missing setting induces partial observability and makes a simple control problem challenging.…”
Section: Pendulum -Random Missing Versionmentioning
confidence: 99%
“…We focused on the performance comparison within a fixed sample size rather than the average runtime or estimated energy cost. We used the code of Mountain Hike environment from (Igl et al 2018), and we modified the POMDP wrapper code provided by (Meng, Gorbet, and Kulic 2021) for the considered Pendulum environment. The episode length is 200 timesteps in the considered Pendulum, and the maximum episode length is 128 timesteps in the sequential target-reaching task.…”
Section: Details In Implementationmentioning
confidence: 99%