Sensitivity Analysis in Markov Decision Processes with Uncertain Reward Parameters

Tan, Chin Hon; Hartman, Joseph C.

doi:10.1017/s002190020000855x

Cited by 4 publications

(4 citation statements)

References 20 publications

(11 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To this end, the weight α in the cost functional L is used as tuning parameter for the online implementation. The approach suits this problem due to the strong continuity in the cost function, which implies that the value function is continuous and therefore the approach is robust against disturbances, such as variations in the A/C settings [31]. The tradeoff weight in the MPC is defined as α ′ to distinguish between the value used in the DP:…”

Section: Resultsmentioning

confidence: 99%

Model Predictive Control for Automotive Climate Control Systems via Value Function Approximation

Kibalama¹,

Liu²,

Stockar³

et al. 2021

Preprint

View full text Add to dashboard Cite

Among the auxiliary loads in light-duty vehicles, the air conditioning system is the single largest energy consumer. For electrified vehicles, the impact of heating and cooling loads becomes even more significant, as they compete with the powertrain for battery energy use and can significantly reduce the range or performance. While considerable work has been made in the field of optimal energy management for electrified vehicles and optimization of vehicle velocity for eco-driving, few contributions have addressed the application of energy-optimal control for heating and cooling loads.This paper proposes an energy management strategy for the thermal management system of an electrified powertrain, based on Model Predictive Control. Starting from a nonlinear model of the vapor compression refrigeration system that captures the dynamics of the refrigerant in the heat exchangers and the power consumption of the system, a constrained multiobjective optimal control problem is formulated to reduce energy consumption while tracking a desired thermal set point. An efficient implementation of MPC is proposed for real-time applications by introducing a terminal cost obtained from the approximation of the global optimal solution.

show abstract

Section: Resultsmentioning

confidence: 99%

Model Predictive Control for Automotive Climate Control Systems via Value Function Approximation

Kibalama¹,

Liu²,

Stockar³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…It is important to consider both of these types of changes because it may be the case that changes in a specific parameter will affect the optimal value function significantly, whereas the optimal policy may not be sensitive to these changes. Tan and Hartman [73] explore the inverse problem of determining the degree to which reward parameters can vary without the optimal policy changing.…”

Section: Quantifying Model Uncertaintymentioning

confidence: 99%

Optimization of Sequential Decision Making for Chronic Diseases: From Data to Decisions

Denton

2018

Recent Advances in Optimization and Modeling of Contemporary Problems

View full text Add to dashboard Cite

Rapid advances in healthcare for chronic diseases such as cardiovascular disease, cancer, and diabetes have made it possible to detect diseases at early stages and tailor treatment based on individual patient risk factors including demographic factors and disease-specific biomarkers. However, a large number of relevant risk factors, combined with uncertainty in future health outcomes and the side effects of health interventions, makes clinical management of diseases challenging for physicians and patients. Data-driven operations research methods have the potential to help improve medical decision making by using observational data that are now routinely collected in many health systems. Optimization methods in particular, such as Markov decision processes and partially observable Markov decision processes, have the potential to improve the protracted sequential decisionmaking process that is common to many chronic diseases. This tutorial provides an introduction to some of the most commonly used methods for building and solving models to optimize sequential decision making. The context of chronic diseases is emphasized, but the methods apply broadly to sequential decision making under uncertainty. We pay special attention to the challenges associated with using observational data and the influence of model parameter uncertainty and ambiguity. Keywords stochastic dynamic programming • Markov decision process • hidden Markov model • chronic disease • data analytics

show abstract

“…where we utilize the fact i∈S,a∈A βy 2 x(i, a) = βy 2 , the S-by-SA matrix A and the S-dimensional column vector b are determined by the constraint equations in (11), c = βr 2 ⊙ − r, c ′ = −2βr, r and x are SA-dimensional column vector with element r(i, a) and x(i, a), respectively. We observe that the right-hand-side of ( 12) is a parametric linear programming (PLP) (Gal and Greenberg, 1997;Tan and Hartman, 2011) with a linear parameter y. Below we do sensitivity analysis for this PLP problem.…”

Section: Lemma 2 (Critical Points) There Exists a Series Of Intervalsmentioning

confidence: 99%

Mean–variance optimization of discrete time discounted Markov decision processes

Xia

2018

Automatica

View full text Add to dashboard Cite

In this paper, we study a mean-variance optimization problem in an infinite horizon discrete time discounted Markov decision process (MDP). The objective is to minimize the variance of system rewards with the constraint of mean performance. Different from most of works in the literature which require the mean performance already achieve optimum, we can let the mean discounted performance equal any constant. The difficulty of this problem is caused by the quadratic form of the variance function which makes the variance minimization problem not a standard MDP. By proving the decomposable structure of the feasible policy space, we transform this constrained variance minimization problem to an equivalent unconstrained MDP under a new discounted criterion and a new reward function. The difference of the variances of Markov chains under any two feasible policies is quantified by a difference formula. Based on the variance difference formula, a policy iteration algorithm is developed to find the optimal policy. We also prove the optimality of deterministic policy over the randomized policy generated in the mean-constrained policy space. Numerical experiments demonstrate the effectiveness of our approach.

show abstract

Sensitivity Analysis in Markov Decision Processes with Uncertain Reward Parameters

Cited by 4 publications

References 20 publications

Model Predictive Control for Automotive Climate Control Systems via Value Function Approximation

Model Predictive Control for Automotive Climate Control Systems via Value Function Approximation

Optimization of Sequential Decision Making for Chronic Diseases: From Data to Decisions

Mean–variance optimization of discrete time discounted Markov decision processes

Contact Info

Product

Resources

About