Discounted Dynamic Programming

Blackwell, David

doi:10.1214/aoms/1177700285

Cited by 844 publications

(413 citation statements)

References 0 publications

Supporting

Mentioning

407

Contrasting

Unclassified

Order By: Relevance

“…The values V N (x) is the supremum of the expected total reward over the finite horizon N with the 0 terminal value V 0 . As was shown by Blackwell [3], the functions V N , N = 1, 2 . .…”

Section: Theorem 13 If There Exists An Optimal Policy Then There Exismentioning

confidence: 65%

“…For the classical dynamic programming problems introduced by Blackwell [3], the reward functions r(x, a) are assumed to be bounded, i.e., |r(x, a)| ≤ K, x ∈ X and a ∈ A(x), for some finite constant K. However, in many operations research applications the reward functions are bounded above, i.e., r(x, a) ≤ K when x ∈ X and a ∈ A(x). For example, in mathematical models of inventory and queueing systems, the one-step holding costs can tend to ∞ as the inventory levels or number of waiting customers increases to ∞.…”

Section: Countable State Space 21 Definitionsmentioning

confidence: 99%

Section: Theorem 15 Either Assumption Su or Assumption Wu Implies Thementioning

confidence: 99%

“…In a general situation, the value function V (x) may not be Borel measurable, but it is universally measurable; see [2,3,7] for detail. Since for any fixed policy π the function v π (x) is Borel, -optimal policies may not exist; Blackwell [3,Example 2]. However, such policies exist if either the definition of -optimal is changed to the definition of so-called (p, )-optimal policies or the set of policies is expanded by allowing the policies to be universally measurable.…”

Section: Theorem 15 Either Assumption Su or Assumption Wu Implies Thementioning

confidence: 99%

See 3 more Smart Citations

Total Expected Discounted Reward MDPS : Existence of Optimal Policies

Feinberg

2011

Wiley Encyclopedia of Operations Research and Management Science

View full text Add to dashboard Cite

show abstract

“…The values V N (x) is the supremum of the expected total reward over the finite horizon N with the 0 terminal value V 0 . As was shown by Blackwell [3], the functions V N , N = 1, 2 . .…”

Section: Theorem 13 If There Exists An Optimal Policy Then There Exismentioning

confidence: 65%

Section: Countable State Space 21 Definitionsmentioning

confidence: 99%

Section: Theorem 15 Either Assumption Su or Assumption Wu Implies Thementioning

confidence: 99%

Section: Theorem 15 Either Assumption Su or Assumption Wu Implies Thementioning

confidence: 99%

See 2 more Smart Citations

Total Expected Discounted Reward MDPS : Existence of Optimal Policies

Feinberg

2011

Wiley Encyclopedia of Operations Research and Management Science

View full text Add to dashboard Cite

show abstract

“…Dynamic programming theory for such Markovian cases [Blackwell 1965, Ross 1970, verifies that a search for a policy that is optimal among all policies may be restricted to the search for policy that is optimal among all stationary policies, i.e.. policies that dictate actions only in accordance with the current state independently of the current stage (year) or past history. Hence, a search for an optimal stationary policy engaged in this thesis is in fact a search for a policy that is optimal without qualifications.…”

Section: Models Of Dynamic Programming With Markov Chains Have Been Umentioning

confidence: 99%