2020
DOI: 10.48550/arxiv.2010.08755
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

Chenjia Bai,
Peng Liu,
Kaiyu Liu
et al.

Abstract: Efficient exploration remains a challenging problem in reinforcement learning, especially for tasks where extrinsic rewards from environments are sparse or even totally disregarded. Significant advances based on intrinsic motivation show promising results in simple environments but often get stuck in environments with multimodal and stochastic dynamics. In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality and stochasticity. We consider t… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 12 publications
0
3
0
Order By: Relevance
“…Several exploration strategies use a dynamics model to provide intrinsic rewards (Pathak et al 2017;Burda et al 2019b;Houthooft et al 2016;Pathak, Gandhi, and Gupta 2019;Kim et al 2019). Latent variable dynamics have also been studied for exploration (Bai et al 2020;Bucher et al 2019;Tao, Francois-Lavet, and Pineau 2020). Maximum entropy in the state representation has been used for exploration, through random encoders, in RE3 (Seo et al 2021), andprototypical representations, in ProtoRL (Yarats et al 2021).…”
Section: Related Workmentioning
confidence: 99%
“…Several exploration strategies use a dynamics model to provide intrinsic rewards (Pathak et al 2017;Burda et al 2019b;Houthooft et al 2016;Pathak, Gandhi, and Gupta 2019;Kim et al 2019). Latent variable dynamics have also been studied for exploration (Bai et al 2020;Bucher et al 2019;Tao, Francois-Lavet, and Pineau 2020). Maximum entropy in the state representation has been used for exploration, through random encoders, in RE3 (Seo et al 2021), andprototypical representations, in ProtoRL (Yarats et al 2021).…”
Section: Related Workmentioning
confidence: 99%
“…These works can be technically classified into three categories: 1) methods that estimate prediction errors of the environmental dynamics; 2) methods that estimate the state novelty; 3) methods based on information gain. Table II presents the characteristics of all reviewed intrinsic motivation-oriented ICM [70] high Curiosity-Driven [71] AR4E [106] high VDM [107] high EMI [108] Novelty TRPO-AE-hash [109] partially A3C+ [110] partially DQN-PixelCNN [111] partially φ-EB [112] partially VAE+ME [113] DQN+SR [114] high DORA [115] high A2C+CoEX [65] RND [64] partially Action balance RND [116] partially Informed exploration [117] partially EX 2 [118] high SFC [119] partially partially CB [72] partially high VSIMR [120] high ECO [121] partially high SMM [122] DeepCS [123] Novelty Search [124] high Information gain VIME [125] high AKL [126] high Disagreement [127] high MAX [128] high exploration algorithms in terms of whether it can apply to continuous control problems, and whether it can solve the white-noise and long-horizon problems described in Section III.…”
Section: B Intrinsic Motivation-oriented Explorationmentioning
confidence: 99%
“…Meanwhile, methods based on prediction errors are limited to deterministic environments, and do not show much advantages in stochastic environments. Later, VDM [107] learns the stochasticity in the transition model through a variational dynamics model, and measures the novelty through evidence lower bound.…”
Section: ) Prediction Errormentioning
confidence: 99%