“…For the classical dynamic programming problems introduced by Blackwell [3], the reward functions r(x, a) are assumed to be bounded, i.e., |r(x, a)| ≤ K, x ∈ X and a ∈ A(x), for some finite constant K. However, in many operations research applications the reward functions are bounded above, i.e., r(x, a) ≤ K when x ∈ X and a ∈ A(x). For example, in mathematical models of inventory and queueing systems, the one-step holding costs can tend to ∞ as the inventory levels or number of waiting customers increases to ∞.…”