Bilinear value networks

Hong, Zhang-Wei; Yang, Ge; Agrawal, Pulkit

doi:10.48550/arxiv.2204.13695

Cited by 2 publications

(9 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Q(s, a, g) = f (s, a) φ(g). The motivation behind bilinear value networks is that decomposing Q(s, a, g) = f (s, a) φ(s, g) may result in better learning efficiency compared to the low-rank bilinear decomposition (Hong, Yang, and Agrawal 2022). Besides the above approaches designed specifically for GCRL, Pitis et al (2020) proposed the Deep Norm (DN) and Wide Norm (WN) families of neural networks that respect the triangle inequality.…”

Section: Related Workmentioning

confidence: 99%

“…2.1) into a bilinear network, e.g. either Q(s, a, g) = f (s, a) φ(g) (Schaul et al 2015) or Q(s, a, g) = f (s, a) φ(s, g) (Hong, Yang, and Agrawal 2022), where f and g are separate neural modules. The principle behind these designs is to inject useful inductive bias into the architecture.…”

Section: Introductionmentioning

confidence: 99%

“…they are restricted to only approximate norm-induced quasi-metrics. On the other hand, the conventional monolithic multi-layer perceptron (MLP) modeling Q(s, a, g), and the recently proposed bilinear value network (BVN)(Hong, Yang, and Agrawal 2022) both inherit the universal approximation guarantee from general neural networks, but they do not enforce the triangle inequality, thus making them less computationally efficient to learn. The Poisson Quasi-metric Embedding (PQE) (Wang and Isola 2022) enjoys the same theoretical guarantee as MRN.…”

mentioning

confidence: 99%

“…Success rate over training epochs for MRN (ours), the monolithic network, BVN(Hong, Yang, and Agrawal 2022), DN and WN(Pitis et al 2020), and PQE (Wang and Isola 2022), on 12 GCRL environments from(Plappert et al 2018). Ablation study on individual symmetric/asymmetric parts of MRN, and on feeding all (s, a, g) to e 2 .…”

mentioning

confidence: 99%

See 3 more Smart Citations

Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning

Liu¹,

Feng²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Goal-conditioned reinforcement learning (GCRL) has a wide range of potential real-world applications, including manipulation and navigation problems in robotics. Especially in such robotics task, sample efficiency is of the utmost importance for GCRL since, by default, the agent is only rewarded when it reaches its goal. While several methods have been proposed to improve the sample efficiency of GCRL, one relatively under-studied approach is the design of neural architectures to support sample efficiency. In this work, we introduce a novel neural architecture for GCRL that achieves significantly better sample efficiency than the commonlyused monolithic network architecture. They key insight is that the optimal action value function Q * (s, a, g) must satisfy the triangle inequality in a specific sense. Furthermore, we introduce the metric residual network (MRN) that deliberately decomposes the action-value function Q(s, a, g) into the negated summation of a metric plus a residual asymmetric component. MRN provably approximates any optimal actionvalue function Q * (s, a, g), thus making it a fitting neural architecture for GCRL. We conduct comprehensive experiments across 12 standard benchmark environments in GCRL. The empirical results demonstrate that MRN uniformly outperforms other state-of-the-art GCRL neural architectures in terms of sample efficiency. The code is publicly available at https://github.com/Cranial-XIX/metric-residual-network.1 (d, X ) defines a quasipseudometric on X if 1) ∀x ∈

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning

Liu¹,

Feng²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…For actor-criticlike methods, prior work has proposed decomposing the critic function (a.k.a the action-value function Q(s, a, g), see Sec. ) into a bilinear network, e.g., either Q(s, a, g) = f (s, a) ⊤ ϕ(g) (Schaul et al 2015) or Q(s, a, g) = f (s, a) ⊤ ϕ(s, g) (Hong, Yang, and Agrawal 2022), where f and g are separate neural modules. The principle behind these designs is to inject useful inductive bias into the architecture.…”

Section: Introductionmentioning

confidence: 99%

Metric Residual Network for Sample Efficient Goal-Conditioned Reinforcement Learning

Liu

Feng

Liu

et al. 2023

AAAI

View full text Add to dashboard Cite

Goal-conditioned reinforcement learning (GCRL) has a wide range of potential real-world applications, including manipulation and navigation problems in robotics. Especially in such robotics tasks, sample efficiency is of the utmost importance for GCRL since, by default, the agent is only rewarded when it reaches its goal. While several methods have been proposed to improve the sample efficiency of GCRL, one relatively under-studied approach is the design of neural architectures to support sample efficiency. In this work, we introduce a novel neural architecture for GCRL that achieves significantly better sample efficiency than the commonly-used monolithic network architecture. The key insight is that the optimal action-value function must satisfy the triangle inequality in a specific sense. Furthermore, we introduce the metric residual network (MRN) that deliberately decomposes the action-value function into the negated summation of a metric plus a residual asymmetric component. MRN provably approximates any optimal action-value function, thus making it a fitting neural architecture for GCRL. We conduct comprehensive experiments across 12 standard benchmark environments in GCRL. The empirical results demonstrate that MRN uniformly outperforms other state-of-the-art GCRL neural architectures in terms of sample efficiency. The code is available at https://github.com/Cranial-XIX/metric-residual-network.

show abstract

Bilinear value networks

Cited by 2 publications

References 2 publications

Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning

Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning

Metric Residual Network for Sample Efficient Goal-Conditioned Reinforcement Learning

Contact Info

Product

Resources

About