1999
DOI: 10.1017/s0263574799211174
|View full text |Cite
|
Sign up to set email alerts
|

REINFORCEMENT LEARNING: AN INTRODUCTION by Richard S. Sutton and Andrew G. Barto, Adaptive Computation and Machine Learning series, MIT Press (Bradford Book), Cambridge, Mass., 1998, xviii + 322 pp, ISBN 0-262-19398-1, (hardback, £31.95).

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 29 publications
(20 citation statements)
references
References 1 publication
0
16
0
Order By: Relevance
“…To overcome the problem of continuous spaces, and to provide a generalization of different observed values, fitted Q iteration [112] is used instead of temporal difference learning [113]. The solution is evaluated for a small company with an EV fleet of 15 EVs where there are 4 EVs in the morning shift, which starts from 6:00 till 14:00.…”
Section: Centralized Day-ahead Planningmentioning
confidence: 99%
“…To overcome the problem of continuous spaces, and to provide a generalization of different observed values, fitted Q iteration [112] is used instead of temporal difference learning [113]. The solution is evaluated for a small company with an EV fleet of 15 EVs where there are 4 EVs in the morning shift, which starts from 6:00 till 14:00.…”
Section: Centralized Day-ahead Planningmentioning
confidence: 99%
“…where Q(s t , a t ) = E[R t | s t , a t ] is the state-action value function, in which the initial action a t is provided to calculate the expected return when starting in the state s t . A baseline function b(s t ) is typically subtracted to reduce the variance while not changing the estimated gradient [44,53]. A natural candidate for this baseline is the state only value function V (s t ) = E[R t | s t ], which is similar to Q(s t , a t ), except the a t is not given here.…”
Section: Inner-set Dependency Controlmentioning
confidence: 99%
“…Where policy structure (the action which the agent uses to evaluate the next action strategy depends on the current state) is recognized as an actor, for the reason that it is employed to choose actions, and determined value of a function can be known as the critic, due to it criticizes actions that made by the actor. The critic has to observe and justify if the policy is being followed by way of the actor or not [37].…”
Section: Rlpbamentioning
confidence: 99%