Performing Deep Recurrent Double Q-Learning for Atari Games

Moreno-Vera, Felipe

doi:10.1109/la-cci47412.2019.9036763

Cited by 19 publications

(8 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In COMPER, update values R(τ t , Ω) used to obtain target values are provided by a recurrent neural network (RNN) parameterized by Ω. That is completely different from some approaches in the literature that have proposed the adoption of recurrent units at the final layers of the target network, such as Hausknecht and Stone (2015) and Moreno-Vera (2019). Here, an RNN is adopted not only to predict values that are used to calculate target values during training but also to built a model that is explored to generate the compact structure of RT M representing previous experiences.…”

Section: Methods Outlinementioning

confidence: 99%

“…They stated that a recurrent network is a viable approach for dealing with observations from multiple states, but it presents no systematic benefits compared to stacking these observations in the input layer of a plain CNN. Moreno-Vera (2019) proposed a similar approach but using DDQN instead of DQN. Wang et al (2016) proposed an architecture named Dueling Network in which they used two parallel streams (instead of a single sequence of fully connected layers) just after the convolutional layers, that are combined in the output by an aggregation layer to produce the estimates of Q-values.…”

Section: Literature Review and Related Workmentioning

confidence: 99%

“…Generally, a convolutional neural network (CNN) is used to approximate both the estimated Q-values and the target values. Some approaches in the literature have proposed the adoption of recurrent units (and even some sort of attention mechanism) at the final layers of the target network (Hausknecht and Stone, 2015;Moreno-Vera, 2019). However they have found out that their proposals present no systematic benefits compared to the adoption of frame stacking with a CNN without any recurrent unit.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Improving Experience Replay through Modeling of Similar Transitions' Sets

Neves¹,

Batisteli²,

Lopes³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this work, we propose and evaluate a new reinforcement learning method, COMPact Experience Replay (COMPER), which uses temporal difference learning with predicted target values based on recurrence over sets of similar transitions, and a new approach for experience replay based on two transitions memories. Our objective is to reduce the required number of experiences to agent training regarding the total accumulated rewarding in the long run. Its relevance to reinforcement learning is related to the small number of observations that it needs to achieve results similar to that obtained by relevant methods in the literature, that generally demand millions of video frames to train an agent on the Atari 2600 games. We report detailed results from five training trials of COMPER for just 100,000 frames and about 25,000 iterations with a small experiences memory on eight challenging games of Arcade Learning Environment (ALE). We also present results for a DQN agent with the same experimental protocol on the same games set as the baseline.To verify the performance of COMPER on approximating a good policy from a smaller number of observations, we also compare its results with that obtained from millions of frames presented on the benchmark of ALE.

show abstract

Section: Methods Outlinementioning

confidence: 99%

Section: Literature Review and Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Improving Experience Replay through Modeling of Similar Transitions' Sets

Neves¹,

Batisteli²,

Lopes³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In order to reduce overestimations, Hasselt et al designed the DDQN [25] from the idea of double Q-learning [26,27]. The online network and the target network are designed to decouple the selection from the evaluation.…”

Section: Dqn and Ddqnmentioning

confidence: 99%

Deep Reinforcement Learning for Dynamic Flexible Job Shop Scheduling with Random Job Arrival

Chang

et al. 2022

Processes

View full text Add to dashboard Cite

The production process of a smart factory is complex and dynamic. As the core of manufacturing management, the research into the flexible job shop scheduling problem (FJSP) focuses on optimizing scheduling decisions in real time, according to the changes in the production environment. In this paper, deep reinforcement learning (DRL) is proposed to solve the dynamic FJSP (DFJSP) with random job arrival, with the goal of minimizing penalties for earliness and tardiness. A double deep Q-networks (DDQN) architecture is proposed and state features, actions and rewards are designed. A soft ε-greedy behavior policy is designed according to the scale of the problem. The experimental results show that the proposed DRL is better than other reinforcement learning (RL) algorithms, heuristics and metaheuristics in terms of solution quality and generalization. In addition, the soft ε-greedy strategy reasonably balances exploration and exploitation, thereby improving the learning efficiency of the scheduling agent. The DRL method is adaptive to the dynamic changes of the production environment in a flexible job shop, which contributes to the establishment of a flexible scheduling system with self-learning, real-time optimization and intelligent decision-making.

show abstract

“…[19]. Since then, new ideas have been proposed, and the learning performance of RL using RNN has improved dramatically [20,21,22]. However, the algorithms of these methods tend to be more complex and computationally expensive, and the vanishing/exploding gradients problem still remains.…”

Section: Introductionmentioning

confidence: 99%

Deep Q-network using reservoir computing with multi-layered readout

Matsuki¹

2022

Preprint

View full text Add to dashboard Cite

Recurrent neural network (RNN) based reinforcement learning (RL) is used for learning context-dependent tasks and has also attracted attention as a method with remarkable learning performance in recent research. However, RNN-based RL has some issues that the learning procedures tend to be more computationally expensive, and training with backpropagation through time (BPTT) is unstable because of vanishing/exploding gradients problem. An approach with replay memory introducing reservoir computing has been proposed, which trains an agent without BPTT and avoids these issues. The basic idea of this approach is that observations from the environment are input to the reservoir network, and both the observation and the reservoir output are stored in the memory. This paper shows that the performance of this method improves by using a multi-layered neural network for the readout layer, which regularly consists of a single linear layer. The experimental results show that using multi-layered readout improves the learning performance of four classical control tasks that require time-series processing.

show abstract

Performing Deep Recurrent Double Q-Learning for Atari Games

Cited by 19 publications

References 5 publications

Improving Experience Replay through Modeling of Similar Transitions' Sets

Improving Experience Replay through Modeling of Similar Transitions' Sets

Deep Reinforcement Learning for Dynamic Flexible Job Shop Scheduling with Random Job Arrival

Deep Q-network using reservoir computing with multi-layered readout

Contact Info

Product

Resources

About