Hindsight Credit Assignment

Harutyunyan, Anna; Dabney, Will; Mesnard, Thomas; Azar, Mohammad Gheshlaghi; Piot, Bilal; Heess, Nicolas; Hasselt, Hado van; Wayne, Greg; Singh, Satinder; Precup, Doina; Munos, Rémi

doi:10.48550/arxiv.1912.02503

Cited by 3 publications

(3 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The reward function is an important part of deep reinforcement learning, which determines the speed and degree of convergence of reinforcement learning algorithms. Reward shaping related work [ 39 , 40 ] has been studied to address the problem that agent does not recognize key actions and is not motivated to explore in more complex scenarios. Sparse rewards can lead to meaningful rewards that are not available most of the time during training, and it is difficult for the agent to learn in the direction of the goal without feedback.…”

Section: Materials and Methodsmentioning

confidence: 99%

A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework

Yin

Chen

Liu

et al. 2023

Sensors

View full text Add to dashboard Cite

The key module for autonomous mobile robots is path planning and obstacle avoidance. Global path planning based on known maps has been effectively achieved. Local path planning in unknown dynamic environments is still very challenging due to the lack of detailed environmental information and unpredictability. This paper proposes an end-to-end local path planner n-step dueling double DQN with reward-based ϵ-greedy (RND3QN) based on a deep reinforcement learning framework, which acquires environmental data from LiDAR as input and uses a neural network to fit Q-values to output the corresponding discrete actions. The bias is reduced using n-step bootstrapping based on deep Q-network (DQN). The ϵ-greedy exploration-exploitation strategy is improved with the reward value as a measure of exploration, and an auxiliary reward function is introduced to increase the reward distribution of the sparse reward environment. Simulation experiments are conducted on the gazebo to test the algorithm’s effectiveness. The experimental data demonstrate that the average total reward value of RND3QN is higher than that of algorithms such as dueling double DQN (D3QN), and the success rates are increased by 174%, 65%, and 61% over D3QN on three stages, respectively. We experimented on the turtlebot3 waffle pi robot, and the strategies learned from the simulation can be effectively transferred to the real robot.

show abstract

Section: Materials and Methodsmentioning

confidence: 99%

A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework

Yin

Chen

Liu

et al. 2023

Sensors

View full text Add to dashboard Cite

show abstract

“…[ [46][47][48] Rewards may be sparse, especially with delayed feedback, and the benefit of intermediate actions may not be immediately obvious.…”

Section: Adaptive Management Challenge Rl Methods and Concepts Citati...mentioning

confidence: 99%

Bridging adaptive management and reinforcement learning for more robust decisions

Chapman

Lapeyrolerie

et al. 2023

Phil. Trans. R. Soc. B

View full text Add to dashboard Cite

From out-competing grandmasters in chess to informing high-stakes healthcare decisions, emerging methods from artificial intelligence are increasingly capable of making complex and strategic decisions in diverse, high-dimensional and uncertain situations. But can these methods help us devise robust strategies for managing environmental systems under great uncertainty? Here we explore how reinforcement learning (RL), a subfield of artificial intelligence, approaches decision problems through a lens similar to adaptive environmental management: learning through experience to gradually improve decisions with updated knowledge. We review where RL holds promise for improving evidence-informed adaptive management decisions even when classical optimization methods are intractable and discuss technical and social issues that arise when applying RL to adaptive management problems in the environmental domain. Our synthesis suggests that environmental management and computer science can learn from one another about the practices, promises and perils of experience-based decision-making. This article is part of the theme issue ‘Detecting and attributing the causes of biodiversity change: needs, gaps and solutions’.

show abstract

“…Many works have studied better credit assignment via state-association, learning an architecture which decomposes the reward function such that certain "important" states comprise most of the credit [50,51,12]. They use the learned reward function to change the reward of an actor-critic algorithm to help propagate signal over long horizons.…”

Section: Credit Assignmentmentioning

confidence: 99%

Decision Transformer: Reinforcement Learning via Sequence Modeling

Chen¹,

Rajeswaran²,

Lee³

et al. 2021

Preprint

View full text Add to dashboard Cite

We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

show abstract

Hindsight Credit Assignment

Cited by 3 publications

References 4 publications

A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework

A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework

Bridging adaptive management and reinforcement learning for more robust decisions

Decision Transformer: Reinforcement Learning via Sequence Modeling

Contact Info

Product

Resources

About