2024
DOI: 10.1109/tnnls.2022.3207346
|View full text |Cite
|
Sign up to set email alerts
|

Deep Reinforcement Learning: A Survey

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
31
0
3

Year Published

2024
2024
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 115 publications
(51 citation statements)
references
References 78 publications
0
31
0
3
Order By: Relevance
“…The difference between Q π (s t , a t ) and V π (s t ) is a lower variance alternative to the action-value function known as the advantage function A π (s t , a t ) since it represents how advantageous it is to take action a t as compared with the average performance we would expect from state s t . These quantities are used throughout the RL field, which is conventionally subdivided into three classes of methods: dynamic programming, model free, and model based [31], [32]. Dynamic programming has its origins in optimal control [33] and may be used to compute an optimal policy based on a known MDP.…”
Section: B Reinforcement Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…The difference between Q π (s t , a t ) and V π (s t ) is a lower variance alternative to the action-value function known as the advantage function A π (s t , a t ) since it represents how advantageous it is to take action a t as compared with the average performance we would expect from state s t . These quantities are used throughout the RL field, which is conventionally subdivided into three classes of methods: dynamic programming, model free, and model based [31], [32]. Dynamic programming has its origins in optimal control [33] and may be used to compute an optimal policy based on a known MDP.…”
Section: B Reinforcement Learningmentioning
confidence: 99%
“…Finally, in model-based methods, we attempt to learn a model of the MDP, which can then be used for planning or to learn a policy by sampling from the MDP and training with a modelfree approach (e.g., Dyna-based methods [34]). For a more comprehensive review of the RL field, we advise the reader to Arulkumaran et al's [31] or Wang et al's [32] deep RL survey.…”
Section: B Reinforcement Learningmentioning
confidence: 99%
“…A fundamental condition for the acceptance of the construct of NR is the existence of a pre-given world. Reenacting such an agent/world setting corresponds to a large field of DL called deep reinforcement learning (DRL) (Mnih et al, 2015;Silver et al, 2016;Eppe et al 2022;Wang et al, 2022). DRL trains as ANN, considered as an agent, to select the right actions based on the observations (or states) of an external environment in order to maximize potential reward (Fig.…”
Section: What Is a Representation?mentioning
confidence: 99%
“…It estimates how good it is for the agent to be in a particular state or to take a specific action. The value function can improve the policy by finding the action that leads to the maximum value in each state [89]. There are two main types of value functions: state-value function and actionvalue function.…”
Section: E Deep Reinforcement Learning For Bcmentioning
confidence: 99%