Training Larger Networks for Deep Reinforcement Learning

Ota, Kei; Jha, Devesh K.; Kanezaki, Asako

doi:10.48550/arxiv.2102.07920

Cited by 10 publications

(13 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Secondly, the network was made deeper, for reasons that will be detailed later. Deeper networks however tend to overfit and therefore to be unhelpful in DRL [ 28 ]. In computer vision, this problem is addressed with batch normalization, which was shown to smooth the optimization landscape, stabilizing gradient estimation [ 29 ].…”

Section: Experiments and Discussionmentioning

confidence: 99%

A Simulator and First Reinforcement Learning Results for Underwater Mapping

Rosynski

Buşoniu

2022

Sensors

View full text Add to dashboard Cite

Underwater mapping with mobile robots has a wide range of applications, and good models are lacking for key parts of the problem, such as sensor behavior. The specific focus here is the huge environmental problem of underwater litter, in the context of the Horizon 2020 SeaClear project, where a team of robots is being developed to map and collect such litter. No reinforcement-learning solution to underwater mapping has been proposed thus far, even though the framework is well suited for robot control in unknown settings. As a key contribution, this paper therefore makes a first attempt to apply deep reinforcement learning (DRL) to this problem by exploiting two state-of-the-art algorithms and making a number of mapping-specific improvements. Since DRL often requires millions of samples to work, a fast simulator is required, and another key contribution is to develop such a simulator from scratch for mapping seafloor objects with an underwater vehicle possessing a sonar-like sensor. Extensive numerical experiments on a range of algorithm variants show that the best DRL method collects litter significantly faster than a baseline lawn mower trajectory.

show abstract

Section: Experiments and Discussionmentioning

confidence: 99%

A Simulator and First Reinforcement Learning Results for Underwater Mapping

Rosynski

Buşoniu

2022

Sensors

View full text Add to dashboard Cite

show abstract

“…Specifically, we perform an in-depth comparison of the performance of PQCs and NNs with varying numbers of parameters on the Cart Pole environment. We show that recent results in classical deep Qlearning also apply to the case when a PQC is used as the function approximator, namely that increasing the number of parameters is only beneficial up to some point [52]. After this, learning becomes increasingly unstable for both PQCs and NNs.…”

Section: Introductionmentioning

confidence: 86%

“…How-ever, the improvement between 10 and 15 layers is relatively compared to that between 5 and 10 layers, similar to a saturation in performance w.r.t. number of parameters found in classical deep RL [52]. We will study this type of scaling behaviour more in-depth and compare it to that of NNs in section 5.2.…”

Section: Frozen Lakementioning

confidence: 95%

Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning

Skolik

Jerbi

Dunjko

2022

Quantum

View full text Add to dashboard Cite

Quantum machine learning (QML) has been identified as one of the key fields that could reap advantages from near-term quantum devices, next to optimization and quantum chemistry. Research in this area has focused primarily on variational quantum algorithms (VQAs), and several proposals to enhance supervised, unsupervised and reinforcement learning (RL) algorithms with VQAs have been put forward. Out of the three, RL is the least studied and it is still an open question whether VQAs can be competitive with state-of-the-art classical algorithms based on neural networks (NNs) even on simple benchmark tasks. In this work, we introduce a training method for parametrized quantum circuits (PQCs) that can be used to solve RL tasks for discrete and continuous state spaces based on the deep Q-learning algorithm. We investigate which architectural choices for quantum Q-learning agents are most important for successfully solving certain types of environments by performing ablation studies for a number of different data encoding and readout strategies. We provide insight into why the performance of a VQA-based Q-learning algorithm crucially depends on the observables of the quantum model and show how to choose suitable observables based on the learning task at hand. To compare our model against the classical DQN algorithm, we perform an extensive hyperparameter search of PQCs and NNs with varying numbers of parameters. We confirm that similar to results in classical literature, the architectural choices and hyperparameters contribute more to the agents' success in a RL setting than the number of parameters used in the model. Finally, we show when recent separation results between classical and quantum agents for policy gradient RL can be extended to inferring optimal Q-values in restricted families of environments.

show abstract

“…To avoid this issue, instead of using existing models available in the Internet, we train a new encoder from scratch with images from ImageNet shrunk to 84x84. To improve computation efficiency and to avoid difficulties in training deep networks in DRL (Bjorck et al, 2021;Ota et al, 2021), we use a light-weight encoder with only 5 convolutional layers, which is 50 times smaller then ResNet34 used in RRL. This allows us to perform experimentation at a much faster pace.…”

Section: Stage 1: Pretraining With Non-rl Datamentioning

confidence: 99%

VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning

Che¹,

Luo²,

Ross³

et al. 2022

Preprint

View full text Add to dashboard Cite

We propose a simple but powerful data-driven framework for solving highly challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major obstacles in taking a data-driven approach, and present a suite of design principles, training strategies, and critical insights about data-driven visual DRL. Our framework has three stages: in stage 1, we leverage non-RL datasets (e.g. ImageNet) to learn task-agnostic visual representations; in stage 2, we use offline RL data (e.g. a limited number of expert demonstrations) to convert the task-agnostic representations into more powerful task-specific representations; in stage 3, we fine-tune the agent with online RL. On a set of highly challenging hand manipulation tasks with sparse reward and realistic visual inputs, our framework learns 370%-1200% faster than the previous SOTA method while using an encoder that is 50 times smaller, fully demonstrating the potential of data-driven deep reinforcement learning.

show abstract

Training Larger Networks for Deep Reinforcement Learning

Cited by 10 publications

References 18 publications

A Simulator and First Reinforcement Learning Results for Underwater Mapping

A Simulator and First Reinforcement Learning Results for Underwater Mapping

Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning

VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning

Contact Info

Product

Resources

About