1965
DOI: 10.1214/aoms/1177700285
|View full text |Cite
|
Sign up to set email alerts
|

Discounted Dynamic Programming

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
407
0
5

Year Published

2004
2004
2011
2011

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 844 publications
(413 citation statements)
references
References 0 publications
1
407
0
5
Order By: Relevance
“…The values V N (x) is the supremum of the expected total reward over the finite horizon N with the 0 terminal value V 0 . As was shown by Blackwell [3], the functions V N , N = 1, 2 . .…”
Section: Theorem 13 If There Exists An Optimal Policy Then There Exismentioning
confidence: 65%
See 3 more Smart Citations
“…The values V N (x) is the supremum of the expected total reward over the finite horizon N with the 0 terminal value V 0 . As was shown by Blackwell [3], the functions V N , N = 1, 2 . .…”
Section: Theorem 13 If There Exists An Optimal Policy Then There Exismentioning
confidence: 65%
“…For the classical dynamic programming problems introduced by Blackwell [3], the reward functions r(x, a) are assumed to be bounded, i.e., |r(x, a)| ≤ K, x ∈ X and a ∈ A(x), for some finite constant K. However, in many operations research applications the reward functions are bounded above, i.e., r(x, a) ≤ K when x ∈ X and a ∈ A(x). For example, in mathematical models of inventory and queueing systems, the one-step holding costs can tend to ∞ as the inventory levels or number of waiting customers increases to ∞.…”
Section: Countable State Space 21 Definitionsmentioning
confidence: 99%
See 2 more Smart Citations
“…Dynamic programming theory for such Markovian cases [Blackwell 1965, Ross 1970, verifies that a search for a policy that is optimal among all policies may be restricted to the search for policy that is optimal among all stationary policies, i.e.. policies that dictate actions only in accordance with the current state independently of the current stage (year) or past history. Hence, a search for an optimal stationary policy engaged in this thesis is in fact a search for a policy that is optimal without qualifications.…”
Section: Models Of Dynamic Programming With Markov Chains Have Been Umentioning
confidence: 99%