Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Chen, Wei; Huang, Duruo; Kulkarni, Ankur A.; Unnikrishnan, Jayakrishnan; Zhu, Quanyan; Mehta, Prashant G.; Meyn, Sean; Wierman, Adam

doi:10.1109/cdc.2009.5399685

Cited by 25 publications

(39 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This construction is useful for establishing properties of the relative value function in the following result. Similar convexity results can be found for models in the queueing literature (see [4], [6], [11]). …”

Section: Optimality Equationssupporting

confidence: 67%

“…Our approach is related to the fluid-model approximations for value functions in [2]- [4], previously applied to approximate dynamic programming approaches of [5]. Our approach is more similar to the recent work [6] in which the approximation to the ACOE is obtained through Taylor series approximations. We obtain approximate solutions to the resultant first order ODE, which in turn yield basis functions useful for LSTD-learning.…”

Section: Introductionmentioning

confidence: 81%

See 1 more Smart Citation

Optimal cross-layer wireless control policies using TD learning

Meyn

Chen

O’Neill

2010

49th IEEE Conference on Decision and Control (CDC)

Self Cite

View full text Add to dashboard Cite

Abstract-We present an on-line crosslayer control technique to characterize and approximate optimal policies for wireless networks. Our approach combines network utility maximization and adaptive modulation over an infinite discrete-time horizon using a class of performance measures we call time smoothed utility functions. We model the system as an averagecost Markov decision problem. Model approximations are used to find suitable basis functions for application of least squares TD-learning techniques. The approach yields network control policies that learn the underlying characteristics of the random wireless channel and that approximately optimize network performance.

show abstract

Section: Optimality Equationssupporting

confidence: 67%

Section: Introductionmentioning

confidence: 81%

Optimal cross-layer wireless control policies using TD learning

Meyn

Chen

O’Neill

2010

49th IEEE Conference on Decision and Control (CDC)

Self Cite

View full text Add to dashboard Cite

show abstract

“…This choice of optimality criterion is motivated by the fact that the value function J * approximates the relative value function for an associated average-cost optimization problem for a stochastic model [8,30,31]. However, computation of J * is infeasible in all but the simplest models.…”

Section: Fluid Modelsmentioning

confidence: 99%

Coding and control for communication networks

et al. 2009

Self Cite

View full text Add to dashboard Cite

The purpose of this paper is to survey techniques for constructing effective policies for controlling complex networks, and to extend these techniques to capture special features of wireless communication networks under different networking scenarios. Among the key questions addressed are:(i) The relationship between static network equilibria, and dynamic network control. (ii) The effect of coding on control and delay through rate regions. (iii) Routing, scheduling, and admission control.Through several examples, ranging from multiple-access systems to network coded multicast, we demonstrate that the rate region for a coded communication network may be approximated by a simple polyhedral subset of a Euclidean space. The polyhedral structure of the rate region, determined by the coding, enables a powerful workload relaxation method that is used for addressing complexity-the relaxation technique provides approximations of a highly complex network by a far simpler one.These approximations are the basis of a specific formulation of an h-MaxWeight policy for network routing. Simulations show a 50% improvement in average delay performance as compared to methods used in current practice.

show abstract

“…In fact, the estimation of cost-to-go functions in traditional online learning (e.g., Q-learning and Reinforcement learning [10][11][12]) requires sufficient observation of a sample-path such that it hits all the states of the FSM a large number of times. Approximations of cost-to-go functions [13,14] are generally based on oversimplified models and thus cannot be accurately used in general practical networks. For instance, the fluid approximation proposed in [14] is based on the assumption that the cost-to-go function is smooth in the state space of the FSM, meaning that only small variations of its value computed in neighboring states are allowed.…”

Section: Introductionmentioning

confidence: 99%

“…Approximations of cost-to-go functions [13,14] are generally based on oversimplified models and thus cannot be accurately used in general practical networks. For instance, the fluid approximation proposed in [14] is based on the assumption that the cost-to-go function is smooth in the state space of the FSM, meaning that only small variations of its value computed in neighboring states are allowed. This assumption is suitable for simple cases (e.g., buffer models and cost functions modeling buffer congestion), but does not hold for more complex FSM models and general cost functions.…”

Section: Introductionmentioning

confidence: 99%

Structure-based learning in wireless networks via sparse approximation

Levorato

Mitra

Goldsmith

2012

J Wireless Com Network

View full text Add to dashboard Cite

A novel framework for the online learning of expected cost-to-go functions characterizing wireless networks performance is proposed. The framework is based on the observation that wireless protocols induce structured and correlated behavior of the finite state machine (FSM) modeling the operations of the network. As a result, a significant dimension reduction can be achieved by projecting the cost-to-go function on a graph wavelet basis set capturing typical sub-structures in the graph associated with the FSM. Sparse approximation with random projection is then used to identify a concise set of coefficients representing the cost-to-go function in the wavelet domain. This Compressed Sensing (CS) approach enables a considerable reduction in the number of observations needed to achieve an accurate estimate of the cost-to-go function. The proposed method is characterized via stability analysis. In particular, we prove that the standard CS approach of the Least Angle Selection and Shrinkage Operator (LASSO) will not provide stability. We also determine a connection between the structure of the FSM induced by the wireless protocols and the restricted isometry property of the effective projection matrix. Simulation results of our approximation method show that 15 wavelet functions can accurately represent a cost-to-go function defined on a state space of 2000 states. Moreover, the number of state-cost observations needed to estimate the cost-to-go function is orders of magnitude smaller than that required by traditional online learning techniques.

show abstract

Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Cited by 25 publications

References 27 publications

Optimal cross-layer wireless control policies using TD learning

Optimal cross-layer wireless control policies using TD learning

Coding and control for communication networks

Structure-based learning in wireless networks via sparse approximation

Contact Info

Product

Resources

About