Topological Experience Replay

Hong, Zhang-Wei; Chen, Tao; Lin, Yiguang; Pajarinen, Joni; Agrawal, Pulkit

doi:10.48550/arxiv.2203.15845

Cited by 3 publications

(3 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to avoid noises caused by outliers of high deviation in density values and therefore ensure robustness, a rank-based prioritization is used after sorting the replay memory based on complementary densities (ρ = 1 − ρ). [23] proposed Topological Experience Replay (TER), which prioritizes state transitions by constructing a graph from gathered training data and applying a reverse sweep algorithm to update a Q-function. The graph is structured in a way, that vertices represent environment states, while edges are equvivalent of state transitions aka experiences.…”

Section: A Related Workmentioning

confidence: 99%

Enhanced Experience Prioritization: A Novel Upper Confidence Bound Approach

Kővári,

Pelenczei,

Bécsi

2023

IEEE Access

View full text Add to dashboard Cite

Value-based Reinforcement Learning algorithms achieve superior performance by utilizing experiences gathered in the past to update their so-called value-function. In most cases, it is accomplished by applying a sampling strategy to an experience buffer, in which state transitions are stored during the training process. However, the design of such methods is not so intuitive. General theoretic approaches tend to determine the expected learning progress from each experience, based on which the update of neural networks can be carried out efficiently. Proper choice of these methods can not only accelerate, but also stabilize the training significantly by increasing sampling efficiency, which indirectly leads to a reduction in time and computing capacity requirements. As one of the most critical aspects of using Machine Learning (ML) based techniques originates from the lack of decent computing power, thus endeavour to find optimal solutions has long been a researched topic in the field of Reinforcement Learning. Therefore the main focus of this research has been to develop an experience prioritization method acquiring competitive performance, besides having the overall cost of training considerably lowered. In this paper, we propose a novel priority value assignment concept for experience prioritization in Reinforcement Learning, based on the Upper Confidence Bound algorithm. Furthermore, we present empirical findings of our solution, that it outperforms current state-of-the-art in terms of sampling efficiency, while enabling faster and more cost-efficient training processes.

show abstract

Section: A Related Workmentioning

confidence: 99%

Enhanced Experience Prioritization: A Novel Upper Confidence Bound Approach

Kővári,

Pelenczei,

Bécsi

2023

IEEE Access

View full text Add to dashboard Cite

show abstract

“…The follow-up work [1] analyzed off-policy Q learning with linear function approximation and reverse experience replay to provide near-optimal convergence guarantees using the special super martingale structure endowed by reverse experience replay. [12] considers topological experience replay, which executes reverse replay over a directed graph of observed transitions. When mixed with PER, this enables non-trivial learning in some hard environments.…”

Section: Reverse Sweep Techniquesmentioning

confidence: 99%

“…From the simplest form of experience replay as used in [19], where uniform sampling from a buffer is used (UER), several specialized methods have been proposed and evaluated experimentally. These include prioritized experience replay (PER) [23], hindsight experience replay (HER) [2], reverse experience replay (RER) [22], and topological experience replay (TER) [12]. The design of experience replay continues to be an active field of research; however, theoretical analyses have been limited.…”

Section: Introductionmentioning

confidence: 99%

Introspective Experience Replay: Look Back When Surprised

Ramnath¹,

Nagaraj²

2022

Preprint

View full text Add to dashboard Cite

Experience replay methods, which are an essential part of reinforcement learning (RL) algorithms, are designed to mitigate spurious correlations and biases while learning from temporally dependent data. Roughly speaking, these methods allow us to draw batched data from a large buffer such that these temporal correlations do not hinder the performance of descent algorithms. In this experimental work, we consider the recently developed and theoretically rigorous reverse experience replay (RER), which has been shown to remove such spurious biases in simplified theoretical settings. We combine RER with optimistic experience replay (OER) to obtain RER++, which is stable under neural function approximation. We show via experiments that this has a better performance than techniques like prioritized experience replay (PER) on various tasks, with a significantly smaller computational complexity. It is well known in the RL literature that choosing examples greedily with the largest TD error (as in OER) or forming mini-batches with consecutive data points (as in RER) leads to poor performance. However, our method, which combines these techniques, works very well.

show abstract

Analysis of Hyper-parameters in Solving Sokoban using Q-learning

Chang¹,

Kwon²

2022

jkiit

View full text Add to dashboard Cite

Topological Experience Replay

Cited by 3 publications

References 8 publications

Enhanced Experience Prioritization: A Novel Upper Confidence Bound Approach

Enhanced Experience Prioritization: A Novel Upper Confidence Bound Approach

Introspective Experience Replay: Look Back When Surprised

Analysis of Hyper-parameters in Solving Sokoban using Q-learning

Contact Info

Product

Resources

About