49th IEEE Conference on Decision and Control (CDC) 2010
DOI: 10.1109/cdc.2010.5717627
|View full text |Cite
|
Sign up to set email alerts
|

State aggregation based linear programming approach to approximate dynamic programming

Abstract: Abstract-One often encounters the curse of dimensionality in the application of dynamic programming to determine optimal policies for controlled Markov chains. In this paper, we provide a method to construct sub-optimal policies along with a bound for the deviation of such a policy from the optimum through the use of restricted linear programming. The novelty of this approach lies in circumventing the need for a value iteration or a linear program defined on the entire state-space. Instead, the state-space is … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2011
2011
2020
2020

Publication Types

Select...
3
3
3

Relationship

1
8

Authors

Journals

citations
Cited by 26 publications
(11 citation statements)
references
References 19 publications
0
11
0
Order By: Relevance
“…An entirely analogous argument establishes that (T V 2 )(x)− (T V 1 )(x) is bounded above by the same final term in (31). Hence the result for M = 1 follows as,…”
Section: Appendix E Proof Of Lyapunov-based Boundmentioning
confidence: 66%
“…An entirely analogous argument establishes that (T V 2 )(x)− (T V 1 )(x) is bounded above by the same final term in (31). Hence the result for M = 1 follows as,…”
Section: Appendix E Proof Of Lyapunov-based Boundmentioning
confidence: 66%
“…V (s) V * (s) for all states s ∈ S. Indeed as is well known, one can pose the original problem as a Linear Program (LP) by minimizing a weighted sum (with positive weights) of the value function, s∈S c(s)V (s) subject to the above constraints [15,16]. In a related publication, the authors have combined the aggregation and LP methods to solve the perimeter patrol problem [17]. Now with the restriction that states in a given partition have the same value, V (s) = a(i), ∀s ∈ S i , inequalities (17) take the form,…”
Section: Value Iteration For the Reduced Order Mdpmentioning
confidence: 97%
“…Computing a solution of problem (5) poses the following difficulties (D1) F(X ) is an infinite dimensional space; (D2) Objective (5a) involves a multidimensional integral over X ; (D3) The T u -operator involves a multidimensional dimensional integral over Ξ; (D4) Constraint (5c) involves an infinite number of constraints; (D5) Constraint (5c) is non-convex in the decision variables; (D6) The objective (5a) involves the maximization of a convex function; Difficulties (D1-D4) apply also to problem (4) and a variety of approaches have been proposed to address them, see for example [27]- [31]. In Section III we take inspiration from previous approaches to propose an approximation algorithm that additionally overcomes difficulties (D5-D6).…”
Section: Point-wise Maximum Formulation Of Dpmentioning
confidence: 99%