Markov decision processes with a minimum-variance criterion

Kurano, Masami

doi:10.1016/0022-247x(87)90332-5

Cited by 29 publications

(25 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Research in this direction was initiated by Mandl [12], Jaquette [5], [6], [8], Benito [1], and Sobel [17]. More recent extensions of these results can be found in [11], [9], and [16]. In particular, in these references the variance (or second moment) of the total expected discounted or average rewards of controlled, discretetime Markov reward chains was considered, to determine the 'best' policy within the class of discounted (or average) optimal policies and find a smaller variance (or lower second moment) of the cumulative reward.…”

Section: Motivationmentioning

confidence: 94%

“…Results can then be expected that are related to those on selecting the 'best' optimal stationary policy with smallest (minimal) variance in the discrete-time case (see, e.g. [9], [11], and [12]). However, as this 'optimization' will be notationally more complex and requires a number of technicalities and results from Markov decision theory, the details and results are left for further research.…”

Section: Remark 33 (Controlled Case)mentioning

confidence: 99%

See 1 more Smart Citation

On the total reward variance for continuous-time Markov reward chains

Dijk

Sladký

2006

J. Appl. Probab.

View full text Add to dashboard Cite

show abstract

Section: Motivationmentioning

confidence: 94%

Section: Remark 33 (Controlled Case)mentioning

confidence: 99%

On the total reward variance for continuous-time Markov reward chains

Dijk

Sladký

2006

J. Appl. Probab.

View full text Add to dashboard Cite

show abstract

“…To the best of our knowledge, most of the aforementioned works in MDPs focus on solving mean-variance problems in discrete-time MDPs (DTMDPs) [3,4,6,13,14,21,24,25,28,30,32] as well as in continuous-time MDPs (CTMDPs) [8,9,10,11,12,20,27], nevertheless, only a few works address mean-variance problems in semi-Markov decision processes (SMDPs); see [2,28] for finite SMDPs and [19] with a finite time horizon. Moreover, it should be noted that most of the existing works on mean-variance problems for MDPs deal with fixed finite or infinite time horizons.…”

Section: Introductionmentioning

confidence: 99%

“…The background of mean-variance problems arises from the tradeoff between the mean and variance, and the fact that a risk-aversion investor usually prefers to a return lower than the maximal one to keep a smaller variance risk. Due to this, mean-variance problems have been widely studied for various dynamic systems described by stochastic differential equations [5,7,22,31], Markov decision processes (MDPs) [2,3,8,10,13,21,27,28], and so on.…”

Section: Introductionmentioning

confidence: 99%

“…For the issue of mean-variance in MDPs, there have been a lot of references; see, [4,19,25,28] for the finite horizon reward variance; [6,10,12,20,28,30] for the infinite horizon discounted reward variance; [11,24,30] for the first passage variance; DOI: 10.14736/kyb-2017- and [2,6,8,9,13,14,21,27,29,32] for the limiting average variance. To the best of our knowledge, most of the aforementioned works in MDPs focus on solving mean-variance problems in discrete-time MDPs (DTMDPs) [3,4,6,13,14,21,24,25,28,30,32] as well as in continuous-time MDPs (CTMDPs) [8,9,10,11,12,20,27], nevertheless, only a few works address mean-variance problems in semi-Markov decision processes (SMDPs); see [2,28] for finite SMDPs and [19] with a finite time horizon.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation