2022
DOI: 10.48550/arxiv.2204.13695
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Bilinear value networks

Abstract: The dominant framework for off-policy multi-goal reinforcement learning involves estimating goal conditioned Q-value function. When learning to achieve multiple goals, data efficiency is intimately connected with generalization of the Q-function to new goals. The de-facto paradigm is to approximate Q(s, a, g) using monolithic neural networks. To improve generalization of the Q-function, we propose a bilinear decomposition that represents the Q-value via a low-rank approximation in the form of a dot product bet… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
9
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(9 citation statements)
references
References 2 publications
0
9
0
Order By: Relevance
“…Q(s, a, g) = f (s, a) φ(g). The motivation behind bilinear value networks is that decomposing Q(s, a, g) = f (s, a) φ(s, g) may result in better learning efficiency compared to the low-rank bilinear decomposition (Hong, Yang, and Agrawal 2022). Besides the above approaches designed specifically for GCRL, Pitis et al (2020) proposed the Deep Norm (DN) and Wide Norm (WN) families of neural networks that respect the triangle inequality.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Q(s, a, g) = f (s, a) φ(g). The motivation behind bilinear value networks is that decomposing Q(s, a, g) = f (s, a) φ(s, g) may result in better learning efficiency compared to the low-rank bilinear decomposition (Hong, Yang, and Agrawal 2022). Besides the above approaches designed specifically for GCRL, Pitis et al (2020) proposed the Deep Norm (DN) and Wide Norm (WN) families of neural networks that respect the triangle inequality.…”
Section: Related Workmentioning
confidence: 99%
“…2.1) into a bilinear network, e.g. either Q(s, a, g) = f (s, a) φ(g) (Schaul et al 2015) or Q(s, a, g) = f (s, a) φ(s, g) (Hong, Yang, and Agrawal 2022), where f and g are separate neural modules. The principle behind these designs is to inject useful inductive bias into the architecture.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…For actor-criticlike methods, prior work has proposed decomposing the critic function (a.k.a the action-value function Q(s, a, g), see Sec. ) into a bilinear network, e.g., either Q(s, a, g) = f (s, a) ⊤ ϕ(g) (Schaul et al 2015) or Q(s, a, g) = f (s, a) ⊤ ϕ(s, g) (Hong, Yang, and Agrawal 2022), where f and g are separate neural modules. The principle behind these designs is to inject useful inductive bias into the architecture.…”
Section: Introductionmentioning
confidence: 99%