2018
DOI: 10.48550/arxiv.1812.00456
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Revisiting the Softmax Bellman Operator: New Benefits and New Perspective

Abstract: The impact of softmax on the value function itself in reinforcement learning (RL) is often viewed as problematic because it leads to sub-optimal value (or Q) functions and interferes with the contraction properties of the Bellman operator. Surprisingly, despite these concerns, and independent of its effect on exploration, the softmax Bellman operator when combined with Deep Q-learning, leads to Q-functions with superior policies in practice, even outperforming its double Q-learning counterpart. To better under… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(11 citation statements)
references
References 7 publications
(19 reference statements)
0
11
0
Order By: Relevance
“…Therefore, the DBS operator rectifies the convergence issue of the Boltzmann softmax operator with fixed parameters. Note that we also achieve a tighter error bound for the fixed-parameter softmax operator in general cases compared with Song et al (2018). In addition, we show that the DBS operator achieves good convergence rate.…”
Section: Introductionmentioning
confidence: 61%
See 3 more Smart Citations
“…Therefore, the DBS operator rectifies the convergence issue of the Boltzmann softmax operator with fixed parameters. Note that we also achieve a tighter error bound for the fixed-parameter softmax operator in general cases compared with Song et al (2018). In addition, we show that the DBS operator achieves good convergence rate.…”
Section: Introductionmentioning
confidence: 61%
“…In this section, we compare the error bound in Corollary 2 with that in Song et al (2018), which studies the error bound of the softmax operator with a fixed parameter β.…”
Section: Relation To Existing Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…These layers are implemented using ReLu activation functions with the linear output layer. For both the DQNEnsemble-FSO/RF and DQN-FSO/RF agents, actions are selected using the Boltzman policy [41]. Our evaluations consider the use of Adam optimizer to minimize the loss function given in Equation 17.…”
Section: B Evaluation Set-upmentioning
confidence: 99%