How Should an Agent Practice?

Rajendran, Janarthanan; Lewis, Richard L.; Veeriah, Vivek; Lee, Honglak; Singh, Satinder

doi:10.1609/aaai.v34i04.5995

Cited by 4 publications

(4 citation statements)

References 4 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the single-task case, the learned intrinsic reward can help accelerate learning simply by adding them to the task-defining reward [275,132]. Rajendran et al [184] consider a different kind of metalearning setting, where the agent can freely practice in the environment between regular evaluation episodes, with the idea that the most efficient kind of practicing may not be the same as just trying to maximize the task-defining reward. The agent does not have access to the environment reward during the practice episodes and instead optimizes a meta-learned intrinsic reward.…”

Section: Learning Intrinsic Rewardsmentioning

confidence: 99%

“…Directly optimizing over these long task horizons is challenging because it can result in vanishing or exploding gradients and has infeasible memory requirements [144]. Instead, as described above, most many-shot meta-RL algorithms adopt a surrogate objective, which considers only one or a few update steps in the innerloop [275,109,229,165,184,266,274,15,230]. These algorithms use either A2C [153]-style [165,274,15,230] or DDPG [125]-style [109] actor-critic objectives in the outer-loop.…”

Section: Auxiliary Tasksmentioning

confidence: 99%

See 1 more Smart Citation

A Survey of Meta-Reinforcement Learning

Beck¹,

Vuorio²,

Liu³

et al. 2023

Preprint

View full text Add to dashboard Cite

While deep reinforcement learning (RL) has fueled multiple high-profile successes in machine learning, it is held back from more widespread adoption by its often poor data efficiency and the limited generality of the policies it produces. A promising approach for alleviating these limitations is to cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. Meta-RL is most commonly studied in a problem setting where, given a distribution of tasks, the goal is to learn a policy that is capable of adapting to any new task from the task distribution with as little data as possible. In this survey, we describe the meta-RL problem setting in detail as well as its major variations. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. Using these clusters, we then survey meta-RL algorithms and applications. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.

show abstract

Section: Learning Intrinsic Rewardsmentioning

confidence: 99%

Section: Auxiliary Tasksmentioning

confidence: 99%

A Survey of Meta-Reinforcement Learning

Beck¹,

Vuorio²,

Liu³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…In a previous work, Rajendran et al (2020) considered a learning process composed of agnostic pre-training (called a practice) and supervised fine-tuning (a match) in a class of environments. However, in their setting the two phases are alternated, and the supervision signal of the matches allows to learn the reward for the practice through a meta-gradient.…”

Section: Related Workmentioning

confidence: 99%

Unsupervised Reinforcement Learning in Multiple Environments

Mutti

Mancassola

Restelli

2022

AAAI

View full text Add to dashboard Cite

Several recent works have been dedicated to unsupervised reinforcement learning in a single environment, in which a policy is first pre-trained with unsupervised interactions, and then fine-tuned towards the optimal policy for several downstream supervised tasks defined over the same environment. Along this line, we address the problem of unsupervised reinforcement learning in a class of multiple environments, in which the policy is pre-trained with interactions from the whole class, and then fine-tuned for several tasks in any environment of the class. Notably, the problem is inherently multi-objective as we can trade off the pre-training objective between environments in many ways. In this work, we foster an exploration strategy that is sensitive to the most adverse cases within the class. Hence, we cast the exploration problem as the maximization of the mean of a critical percentile of the state visitation entropy induced by the exploration strategy over the class of environments. Then, we present a policy gradient algorithm, alphaMEPOL, to optimize the introduced objective through mediated interactions with the class. Finally, we empirically demonstrate the ability of the algorithm in learning to explore challenging classes of continuous environments and we show that reinforcement learning greatly benefits from the pre-trained exploration strategy w.r.t. learning from scratch.

show abstract

“…We should note that Rajendran et al [37] also proposed a transfer framework based on intrinsic rewards. In their work, the agent switches between practice episodes -where the agent receives only intrinsic rewards-and match episodes -giving only extrinsic rewards.…”

Section: Learning To Explorementioning

confidence: 99%

Interesting Object, Curious Agent: Learning Task-Agnostic Exploration

Parisi¹,

Dean²,

Pathak³

et al. 2021

Preprint

View full text Add to dashboard Cite

Common approaches for task-agnostic exploration learn tabula-rasa -the agent assumes isolated environments and no prior knowledge or experience. However, in the real world, agents learn in many environments and always come with prior experiences as they explore new ones. Exploration is a lifelong process. In this paper, we propose a paradigm change in the formulation and evaluation of taskagnostic exploration. In this setup, the agent first learns to explore across many environments without any extrinsic goal in a task-agnostic manner. Later on, the agent effectively transfers the learned exploration policy to better explore new environments when solving tasks. In this context, we evaluate several baseline exploration strategies and present a simple yet effective approach to learning taskagnostic exploration policies. Our key idea is that there are two components of exploration: (1) an agent-centric component encouraging exploration of unseen parts of the environment based on an agent's belief; (2) an environment-centric component encouraging exploration of inherently interesting objects. We show that our formulation is effective and provides the most consistent exploration across several training-testing environment pairs. We also introduce benchmarks and metrics for evaluating task-agnostic exploration strategies. The source code is available at https://github.com/sparisi/cbet/.

show abstract

How Should an Agent Practice?

Cited by 4 publications

References 4 publications

A Survey of Meta-Reinforcement Learning

A Survey of Meta-Reinforcement Learning

Unsupervised Reinforcement Learning in Multiple Environments

Interesting Object, Curious Agent: Learning Task-Agnostic Exploration

Contact Info

Product

Resources

About