2018
DOI: 10.1007/978-3-030-01054-6_1
|View full text |Cite
|
Sign up to set email alerts
|

ViZDoom: DRQN with Prioritized Experience Replay, Double-Q Learning and Snapshot Ensembling

Abstract: ViZDoom is a robust, first-person shooter reinforcement learning environment, characterized by a significant degree of latent state information. In this paper, double-Q learning and prioritized experience replay methods are tested under a certain ViZDoom combat scenario using a competitive deep recurrent Q-network (DRQN) architecture. In addition, an ensembling technique known as snapshot ensembling is employed using a specific annealed learning rate to observe differences in ensembling efficacy under these tw… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 24 publications
(17 citation statements)
references
References 10 publications
0
15
0
Order By: Relevance
“…Human and animal neural evidence supports a role for wake and sleep memory replay in generalization and community detection [59,60] , inference [52,61] especially of unseen relations among states, problem solving, and memory consolidation [62,63] . Deep reinforcement learning algorithms also benefit from prioritized replay in generalization, discovery, and adversarial self-learning [64][65][66] . Different replay algorithms differ with regards to how memories are stored and the model from which they generate the experience.…”
Section: -Learning Structures Via Replay and Prioritizationmentioning
confidence: 99%
“…Human and animal neural evidence supports a role for wake and sleep memory replay in generalization and community detection [59,60] , inference [52,61] especially of unseen relations among states, problem solving, and memory consolidation [62,63] . Deep reinforcement learning algorithms also benefit from prioritized replay in generalization, discovery, and adversarial self-learning [64][65][66] . Different replay algorithms differ with regards to how memories are stored and the model from which they generate the experience.…”
Section: -Learning Structures Via Replay and Prioritizationmentioning
confidence: 99%
“…We may find that these “replays” preferentially included the decision point at the middle of the track, rather than the ends of the track. We may be tempted to report this as a finding of interest, perhaps with an interpretation emphasizing prioritized replay as a mechanism useful for reinforcement learning (Schaul et al, ; Gershman and Daw, ). However, Figure should make it clear that, in the data set used here, such a bias is a straightforward consequence of the increase in cross‐validated decoding error at both ends of the track.…”
Section: Discussionmentioning
confidence: 98%
“…The first one is that there are strong correlations among the incoming data, which may break the assumption of many popular stochastic gradient-based algorithms. Secondly, the minor changes in the function may result in a huge change in the policy, which makes the algorithm difficult to converge [7,9,14,15].…”
Section: Target Deep Learningmentioning
confidence: 99%
“…The convergence issue was mentioned in 2015 by Schaul et al [14]. The above -learning update rules can be directly implemented in a neural network.…”
Section: Target Deep Learningmentioning
confidence: 99%