2020
DOI: 10.48550/arxiv.2010.09536
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

What About Inputing Policy in Value Function: Policy Representation and Policy-extended Value Function Approximator

Abstract: The value function lies in the heart of Reinforcement Learning (RL), which defines the long-term evaluation of a policy in a given state. In this paper, we propose Policy-extended Value Function Approximator (PeVFA) which extends the conventional value to be not only a function of state but also an explicit policy representation. Such an extension enables PeVFA to preserve values of multiple policies in contrast to a conventional one with limited capacity for only one policy, inducing the new characteristic of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
3

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(13 citation statements)
references
References 9 publications
0
13
0
Order By: Relevance
“…Harb et al (2020) uses the actions that policy sampled in probing states as policy representations. There are also some articles (Faccio et al, 2021;Tang et al, 2020) that use policy network parameters as policy representations.…”
Section: E2 Representation Learningmentioning
confidence: 99%
“…Harb et al (2020) uses the actions that policy sampled in probing states as policy representations. There are also some articles (Faccio et al, 2021;Tang et al, 2020) that use policy network parameters as policy representations.…”
Section: E2 Representation Learningmentioning
confidence: 99%
“…Recent work (Tang et al, 2020) learned Parameter-Based State-Value Functions which, coupled with PPO, improved performance. The authors did not use the value function to directly backpropagate gradients through the policy parameters, but only exploited the general policy evaluation properties of the method.…”
Section: Related Workmentioning
confidence: 99%
“…Intuitively and naturally, such issues can be significantly alleviated if we have an ideal surrogate policy space, which are compact in scale while keep the key features of policy space. Related to this direction, low-dimensional latent representation of policy plays an important role in Reinforcement Learning (RL) [34], Opponent Modeling [8], Fast Adaptation [25,27], Behavioral Characterization [14] and etc. In these domains, a few preliminary attempts have been made in devising different policy representations.…”
Section: Introductionmentioning
confidence: 99%
“…Rather than policy distribution, some other works resort to the information of policy's influence on the environment, e.g., state(-action) visitation distribution induced by the policy [14,20]. Recently, Tang et al [34] offers several methods to learn policy representation through policy contrast or recovery from both policy network parameters and interaction experiences. Put shortly, the key question of policy representation learning is by what criterion we should abstract the policy space for desired compression and generalization.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation