2022
DOI: 10.48550/arxiv.2203.15845
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Topological Experience Replay

Abstract: State-of-the-art deep Q-learning methods update Q-values using state transition tuples sampled from the experience replay buffer. This strategy often uniformly and randomly samples or prioritizes data sampling based on measures such as the temporal difference (TD) error. Such sampling strategies can be inefficient at learning Q-function because a state's Q-value depends on the Q-value of successor states. If the data sampling strategy ignores the precision of Q-value estimate of the next state, it can lead to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 8 publications
0
3
0
Order By: Relevance
“…In order to avoid noises caused by outliers of high deviation in density values and therefore ensure robustness, a rank-based prioritization is used after sorting the replay memory based on complementary densities (ρ = 1 − ρ). [23] proposed Topological Experience Replay (TER), which prioritizes state transitions by constructing a graph from gathered training data and applying a reverse sweep algorithm to update a Q-function. The graph is structured in a way, that vertices represent environment states, while edges are equvivalent of state transitions aka experiences.…”
Section: A Related Workmentioning
confidence: 99%
“…In order to avoid noises caused by outliers of high deviation in density values and therefore ensure robustness, a rank-based prioritization is used after sorting the replay memory based on complementary densities (ρ = 1 − ρ). [23] proposed Topological Experience Replay (TER), which prioritizes state transitions by constructing a graph from gathered training data and applying a reverse sweep algorithm to update a Q-function. The graph is structured in a way, that vertices represent environment states, while edges are equvivalent of state transitions aka experiences.…”
Section: A Related Workmentioning
confidence: 99%
“…The follow-up work [1] analyzed off-policy Q learning with linear function approximation and reverse experience replay to provide near-optimal convergence guarantees using the special super martingale structure endowed by reverse experience replay. [12] considers topological experience replay, which executes reverse replay over a directed graph of observed transitions. When mixed with PER, this enables non-trivial learning in some hard environments.…”
Section: Reverse Sweep Techniquesmentioning
confidence: 99%
“…From the simplest form of experience replay as used in [19], where uniform sampling from a buffer is used (UER), several specialized methods have been proposed and evaluated experimentally. These include prioritized experience replay (PER) [23], hindsight experience replay (HER) [2], reverse experience replay (RER) [22], and topological experience replay (TER) [12]. The design of experience replay continues to be an active field of research; however, theoretical analyses have been limited.…”
Section: Introductionmentioning
confidence: 99%