Information asymmetry in KL-regularized RL

Galashov, Alexandre; Jayakumar, Siddhant M.; Hasenclever, Leonard; Tirumala, Dhruva; Schwarz, Jonathan; Desjardins, Guillaume; Czarnecki, Wojciech Marian; Teh, Yee Whye; Pascanu, Razvan; Heess, Nicolas

doi:10.48550/arxiv.1905.01240

Cited by 11 publications

(30 citation statements)

References 20 publications

(38 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…IA can be understood as the masking of in-formation accessible by certain modules. Not conditioning on specific environment aspects forces independence and generalisation across them [8]. In the context of hierarchical KL-regularized RL, the explored asymmetries between the high-level policy, π H , and prior, π H 0 , have been narrow [14,19].…”

Section: Information Asymmetrymentioning

confidence: 99%

“…By presenting multiple priors, we enable a comparison with existing literature [14,19,20,21]. With the right masking, one can recover previously investigated asymmetries [14,19], explore additional ones, and also express purely hierarchical [9] and KL-regularized equivalents [8].…”

Section: Information Asymmetrymentioning

confidence: 99%

“…To this end, two paradigms have recently been introduced. KL-regularized RL [7,8] presents an intuitive approach for automating skill reuse in multi-task learning. By regularizing policy behaviour against a learnt task-agnostic prior, common behaviours across tasks are distilled into the prior, which encourages their reuse.…”

Section: Introductionmentioning

confidence: 99%

“…IA more generally refers to an asymmetric masking of information across architectural modules. This masking forces independence to, and ideally generalisation across, the masked dimensions [8]. Therefore, IA crucially biases learnt behaviour and how it transfers across environments.…”

Section: Introductionmentioning

confidence: 99%

“…Therefore, IA crucially biases learnt behaviour and how it transfers across environments. Previous works have motivated their chosen IAs primarily on intuition [14,8,9], which, if sub-optimal, limits transfer benefits. We demonstrate that this indeed is the case for the hierarchical KL-regularized method in [14].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

Salter¹,

Hartikainen²,

Goodwin³

et al. 2022

Preprint

View full text Add to dashboard Cite

The ability to discover behaviours from past experience and transfer them to new tasks is a hallmark of intelligent agents acting sample-efficiently in the real world. Equipping embodied reinforcement learners with the same ability may be crucial for their successful deployment in robotics. While hierarchical and KL-regularized RL individually hold promise here, arguably a hybrid approach could combine their respective benefits. Key to these fields is the use of information asymmetry to bias which skills are learnt. While asymmetric choice has a large influence on transferability, prior works have explored a narrow range of asymmetries, primarily motivated by intuition. In this paper, we theoretically and empirically show the crucial trade-off, controlled by information asymmetry, between the expressivity and transferability of skills across sequential tasks. Given this insight, we provide a principled approach towards choosing asymmetry and apply our approach to a complex, robotic block stacking domain, unsolvable by baselines, demonstrating the effectiveness of hierarchical KL-regularized RL, coupled with correct asymmetric choice, for sample-efficient transfer learning.

show abstract

Section: Information Asymmetrymentioning

confidence: 99%

Section: Information Asymmetrymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

Salter¹,

Hartikainen²,

Goodwin³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

Bayesian Reinforcement Learning With Limited Cognitive Load

Arumugam,

Ho,

Goodman

et al. 2024

Open Mind

View full text Add to dashboard Cite

All biological and artificial agents must act given limits on their ability to acquire and process information. As such, a general theory of adaptive behavior should be able to account for the complex interactions between an agent’s learning history, decisions, and capacity constraints. Recent work in computer science has begun to clarify the principles that shape these dynamics by bridging ideas from reinforcement learning, Bayesian decision-making, and rate-distortion theory. This body of work provides an account of capacity-limited Bayesian reinforcement learning, a unifying normative framework for modeling the effect of processing constraints on learning and action selection. Here, we provide an accessible review of recent algorithms and theoretical results in this setting, paying special attention to how these ideas can be applied to studying questions in the cognitive and behavioral sciences.

show abstract

Bayesian controller fusion: Leveraging control priors in deep reinforcement learning for robotics

Rana

Dasagi

Haviland

et al. 2023

The International Journal of Robotics Research

View full text Add to dashboard Cite

We present Bayesian Controller Fusion (BCF): a hybrid control strategy that combines the strengths of traditional hand-crafted controllers and model-free deep reinforcement learning (RL). BCF thrives in the robotics domain, where reliable but suboptimal control priors exist for many tasks, but RL from scratch remains unsafe and data-inefficient. By fusing uncertainty-aware distributional outputs from each system, BCF arbitrates control between them, exploiting their respective strengths. We study BCF on two real-world robotics tasks involving navigation in a vast and long-horizon environment, and a complex reaching task that involves manipulability maximisation. For both these domains, simple handcrafted controllers exist that can solve the task at hand in a risk-averse manner but do not necessarily exhibit the optimal solution given limitations in analytical modelling, controller miscalibration and task variation. As exploration is naturally guided by the prior in the early stages of training, BCF accelerates learning, while substantially improving beyond the performance of the control prior, as the policy gains more experience. More importantly, given the risk-aversity of the control prior, BCF ensures safe exploration and deployment, where the control prior naturally dominates the action distribution in states unknown to the policy. We additionally show BCF’s applicability to the zero-shot sim-to-real setting and its ability to deal with out-of-distribution states in the real world. BCF is a promising approach towards combining the complementary strengths of deep RL and traditional robotic control, surpassing what either can achieve independently. The code and supplementary video material are made publicly available at https://krishanrana.github.io/bcf .

show abstract

Information asymmetry in KL-regularized RL

Cited by 11 publications

References 20 publications

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

Bayesian Reinforcement Learning With Limited Cognitive Load

Bayesian controller fusion: Leveraging control priors in deep reinforcement learning for robotics

Contact Info

Product

Resources

About