2020
DOI: 10.48550/arxiv.2010.14498
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning

Abstract: We identify an implicit under-parameterization phenomenon in value-based deep RL methods that use bootstrapping: when value functions, approximated using deep neural networks, are trained with gradient descent using iterated regression onto target values generated by previous instances of the value network, more gradient updates decrease the expressivity of the current value network. We characterize this loss of expressivity in terms of a drop in the rank of the learned value network features, and show that th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 23 publications
(43 reference statements)
2
5
0
Order By: Relevance
“…We perform their experiment, this time with a C51 agent with and without normalised layers looking to better understand the regularisation effects of normalisation. Figure 21 shows an evolution of the effective rank for the baseline agent that is consistent with the report of (Kumar et al, 2020). Interestingly, the baseline agent is consistently the one making use of fewer and fewer dimensions in the feature space as training progresses while the normalised agents preserve the rank.…”
Section: C4 Effective Ranksupporting
confidence: 63%
See 2 more Smart Citations
“…We perform their experiment, this time with a C51 agent with and without normalised layers looking to better understand the regularisation effects of normalisation. Figure 21 shows an evolution of the effective rank for the baseline agent that is consistent with the report of (Kumar et al, 2020). Interestingly, the baseline agent is consistently the one making use of fewer and fewer dimensions in the feature space as training progresses while the normalised agents preserve the rank.…”
Section: C4 Effective Ranksupporting
confidence: 63%
“…(Miyato et al, 2018) contrasts SN and WN and argue that the Frobenius norm encourages a loss in the number of usable features of the learned representations. Our experiments support this argument: measuring the effective rank (Kumar et al, 2020) shows a faster loss of feature rank for the baseline agent compared to any SN agent (Fig. 21 in Appendix).…”
Section: Related Worksupporting
confidence: 60%
See 1 more Smart Citation
“…The performance of CriticSMC relies heavily on the quality of the critic and in this work we trained it using a basic TD update from Equation 6. One avenue for future work is devising more efficient and stable algorithms for learning the soft Q function such as proximal updates [56] or regularization which guards against deterioration [38].…”
Section: Discussionmentioning
confidence: 99%
“…Our evaluation focuses on discrete-action on-policy RL algorithms since many factors that influence the learning of off-policy methods are still not well understood (Achiam et al, 2019;Kumar et al, 2020;Van Hasselt et al, 2018;Fu et al, 2019). Specifically we compare three algorithms that CVS consistently exhibits higher sample efficiency than both PPO and PPOF showing that dynamic modularity correlates with more efficient transfer.…”
Section: Simple Experimentsmentioning
confidence: 99%