Improving Generalization of Reinforcement Learning for Multi-agent Combating Games

Wan, Kejia; Xu, Xinhai; Li, Yuan

doi:10.1007/978-3-030-92270-2_6

Cited by 6 publications

(7 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We use E-MAML [39] and ProMP [35] as baselines representing gradient-based meta-RL methods. We implement RL 2 -based Mixreg [43] to evaluate the difference between generating mixture tasks in the latent space and the observation space.…”

Section: Methodsmentioning

confidence: 99%

“…Many image augmentation techniques such as random convolution, random shift, l2-regularization, dropout, batch normalization and noise injection are shown to improve the generalization of RL [5,16,21,23,32,49]. Mixreg [43] that applies the idea of mix-up [52] in RL, generates new training data as a convex interpolation of input observation and output reward. LDM can be thought of as a data-augmentation method in a way that it generates a mixture task using the data from training tasks.…”

Section: Data-augmentation For Reinforcement Learningmentioning

confidence: 99%

See 1 more Smart Citation

Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture

Lee¹,

Chung²

2021

Preprint

View full text Add to dashboard Cite

The generalization ability of most meta-reinforcement learning (meta-RL) methods is largely limited to test tasks that are sampled from the same distribution used to sample training tasks. To overcome the limitation, we propose Latent Dynamics Mixture (LDM) that trains a reinforcement learning agent with imaginary tasks generated from mixtures of learned latent dynamics. By training a policy on mixture tasks along with original training tasks, LDM allows the agent to prepare for unseen test tasks during training and prevents the agent from overfitting the training tasks. LDM significantly outperforms standard meta-RL methods in test returns on the gridworld navigation and MuJoCo tasks where we strictly separate the training task distribution and the test task distribution.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Data-augmentation For Reinforcement Learningmentioning

confidence: 99%

Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture

Lee¹,

Chung²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Prior work has explored improving generalization by training on a large amount of levels [13], adding dataaugmentation to visual inputs [21,22], knowledge transfer during training [23], and self-supervised world models [24]. Beyond zero-shot generalization, few prior works use Procgen to evaluate meta-RL, with one exception being Alver et al [25], which showed that RL 2 failed to generalize on simplified Procgen games.…”

Section: Procgen Experimentsmentioning

confidence: 99%

On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

Zhao¹,

Abbeel²,

James³

2022

Preprint

View full text Add to dashboard Cite

Intelligent agents should have the ability to leverage knowledge from previously learned tasks in order to learn new ones quickly and efficiently. Meta-learning approaches have emerged as a popular solution to achieve this. However, metareinforcement learning (meta-RL) algorithms have thus far been restricted to simple environments with narrow task distributions. Moreover, the paradigm of pretraining followed by fine-tuning to adapt to new tasks has emerged as a simple yet effective solution in supervised and self-supervised learning. This calls into question the benefits of meta-learning approaches in reinforcement learning, which typically come at the cost of high complexity. We therefore investigate meta-RL approaches in a variety of vision-based benchmarks, including Procgen, RLBench, and Atari, where evaluations are made on completely novel tasks. Our findings show that when meta-learning approaches are evaluated on different tasks (rather than different variations of the same task), multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation. This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL. From these findings, we advocate for evaluating future meta-RL methods on more challenging tasks and including multi-task pretraining with fine-tuning as a simple, yet strong baseline. 1

show abstract

“…In Table 8 we compare our best hyperbolic PPO agent with the reported results for the current SotA Procgen algorithms from Raileanu & Fergus (2021). All these works propose domain-specific practices on top of PPO (Schulman et al, 2017), designed and tuned for the Procgen benchmark: Mixture Regularization (MixReg) (Wang et al, 2020), Prioritized Level Replay (PLR) (Jiang et al, 2021), Data-regularized Actor-Critic (DraC) (Raileanu et al, 2020), Phasic Policy Gradient (PPG) (Cobbe et al, 2021), and Invariant Decoupled Advantage Actor Critic (Raileanu & Fergus, 2021).Validating our implementation, we see that our Euclidean PPO results closely match the previously reported ones, lagging severely behind all other methods. In contrast, we see that introducing our deep hyperbolic representations framework makes PPO outperform all considered baselines but IDAAC, attaining overall similar scores to this algorithm employing several domain-specific practices.…”

Section: D2 Sota Comparison On Procgenmentioning

confidence: 99%

Hyperbolic Deep Reinforcement Learning

Cetin¹,

Chamberlain²,

Bronstein³

et al. 2022

Preprint

View full text Add to dashboard Cite

We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space. Sequential decision-making requires reasoning about the possible future consequences of current behavior. Consequently, capturing the relationship between key evolving features for a given task is conducive to recovering effective policies. To this end, hyperbolic geometry provides deep RL models with a natural basis to precisely encode this inherently hierarchical information. However, applying existing methodologies from the hyperbolic deep learning literature leads to fatal optimization instabilities due to the non-stationarity and variance characterizing RL gradient estimators. Hence, we design a new general method that counteracts such optimization challenges and enables stable end-to-end learning with deep hyperbolic representations. We empirically validate our framework by applying it to popular on-policy and offpolicy RL algorithms on the Procgen and Atari 100K benchmarks, attaining near universal performance and generalization benefits. Given its natural fit, we hope future RL research will consider hyperbolic representations as a standard tool. Project website: sites.google.com/view/hyperbolic-rl

show abstract

Improving Generalization of Reinforcement Learning for Multi-agent Combating Games

Cited by 6 publications

References 2 publications

Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture

Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture

On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

Hyperbolic Deep Reinforcement Learning

Contact Info

Product

Resources

About