Precision-feeding dairy heifers a high rumen-degradable protein diet with different proportions of dietary fiber and forage-to-concentrate ratios

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games in the Atari 2600 domain. We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

show abstract

Multi-Task Deep Reinforcement Learning with PopArt

Hessel

Soyer

Espeholt

et al. 2019

AAAI

329

388

View full text Add to dashboard Cite

The reinforcement learning community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequentialdecision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent's updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy -with a single set of weights -that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.

show abstract

Dueling Network Architectures for Deep Reinforcement Learning

Wang

Schaul

Hessel

et al. 2015

Preprint

216

293

View full text Add to dashboard Cite

Rainbow: Combining Improvements in Deep Reinforcement Learning

Hessel

Modayil

Hasselt

et al. 2018

AAAI

734

275

View full text Add to dashboard Cite

The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance. We also provide results from a detailed ablation study that shows the contribution of each component to overall performance.

show abstract

Deep Reinforcement Learning with Double Q-learning

Hasselt¹,

Guez²,

Silver³

2015

Preprint

127

177

View full text Add to dashboard Cite

show abstract

A theoretical and empirical analysis of Expected Sarsa

Hasselt²,

et al. 2009

View full text Add to dashboard Cite

Rainbow: Combining Improvements in Deep Reinforcement Learning

Hessel¹,

Modayil²,

Hasselt³

et al. 2017

Preprint

146

View full text Add to dashboard Cite

Distributed Prioritized Experience Replay

Horgan¹,

Quan²,

Budden³

et al. 2018

Preprint

119

137

View full text Add to dashboard Cite

We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible. The algorithm decouples acting from learning: the actors interact with their own instances of the environment by selecting actions according to a shared neural network, and accumulate the resulting experience in a shared experience replay memory; the learner replays samples of experience and updates the neural network. The architecture relies on prioritized experience replay to focus only on the most significant data generated by the actors. Our architecture substantially improves the state of the art on the Arcade Learning Environment, achieving better final performance in a fraction of the wall-clock training time.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hado van Hasselt

Deep Reinforcement Learning with Double Q-Learning

Multi-Task Deep Reinforcement Learning with PopArt

Dueling Network Architectures for Deep Reinforcement Learning

Rainbow: Combining Improvements in Deep Reinforcement Learning

Deep Reinforcement Learning with Double Q-learning

A theoretical and empirical analysis of Expected Sarsa

Rainbow: Combining Improvements in Deep Reinforcement Learning

Distributed Prioritized Experience Replay

Contact Info

Product

Resources

About