2021
DOI: 10.1007/978-3-030-92270-2_6
|View full text |Cite
|
Sign up to set email alerts
|

Improving Generalization of Reinforcement Learning for Multi-agent Combating Games

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 2 publications
0
7
0
Order By: Relevance
“…We use E-MAML [39] and ProMP [35] as baselines representing gradient-based meta-RL methods. We implement RL 2 -based Mixreg [43] to evaluate the difference between generating mixture tasks in the latent space and the observation space.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We use E-MAML [39] and ProMP [35] as baselines representing gradient-based meta-RL methods. We implement RL 2 -based Mixreg [43] to evaluate the difference between generating mixture tasks in the latent space and the observation space.…”
Section: Methodsmentioning
confidence: 99%
“…Many image augmentation techniques such as random convolution, random shift, l2-regularization, dropout, batch normalization and noise injection are shown to improve the generalization of RL [5,16,21,23,32,49]. Mixreg [43] that applies the idea of mix-up [52] in RL, generates new training data as a convex interpolation of input observation and output reward. LDM can be thought of as a data-augmentation method in a way that it generates a mixture task using the data from training tasks.…”
Section: Data-augmentation For Reinforcement Learningmentioning
confidence: 99%
“…Prior work has explored improving generalization by training on a large amount of levels [13], adding dataaugmentation to visual inputs [21,22], knowledge transfer during training [23], and self-supervised world models [24]. Beyond zero-shot generalization, few prior works use Procgen to evaluate meta-RL, with one exception being Alver et al [25], which showed that RL 2 failed to generalize on simplified Procgen games.…”
Section: Procgen Experimentsmentioning
confidence: 99%
“…In Table 8 we compare our best hyperbolic PPO agent with the reported results for the current SotA Procgen algorithms from Raileanu & Fergus (2021). All these works propose domain-specific practices on top of PPO (Schulman et al, 2017), designed and tuned for the Procgen benchmark: Mixture Regularization (MixReg) (Wang et al, 2020), Prioritized Level Replay (PLR) (Jiang et al, 2021), Data-regularized Actor-Critic (DraC) (Raileanu et al, 2020), Phasic Policy Gradient (PPG) (Cobbe et al, 2021), and Invariant Decoupled Advantage Actor Critic (Raileanu & Fergus, 2021).Validating our implementation, we see that our Euclidean PPO results closely match the previously reported ones, lagging severely behind all other methods. In contrast, we see that introducing our deep hyperbolic representations framework makes PPO outperform all considered baselines but IDAAC, attaining overall similar scores to this algorithm employing several domain-specific practices.…”
Section: D2 Sota Comparison On Procgenmentioning
confidence: 99%