1993
DOI: 10.1137/0331018
|View full text |Cite
|
Sign up to set email alerts
|

Discrete-Time Controlled Markov Processes with Average Cost Criterion: A Survey

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

2
290
0
12

Year Published

1994
1994
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 529 publications
(305 citation statements)
references
References 156 publications
2
290
0
12
Order By: Relevance
“…The literature on average cost MDPs is vast. Most of the earlier results are surveyed in Arapostathis et al [1]. Here, we mention just a few references.…”
mentioning
confidence: 93%
“…The literature on average cost MDPs is vast. Most of the earlier results are surveyed in Arapostathis et al [1]. Here, we mention just a few references.…”
mentioning
confidence: 93%
“…245-247) is discussed in the online supplement. For average reward, a popular approach, called the vanishing-discount approach (Arapostathis et al, 1993), employs any discounted RL algorithm with a very small positive value for R. Alternatively, one can use R-SMART (Gosavi, 2004a) that differs from Q-Learning (Figure 6) for MDPs as follows. In Step 1, ρ, which denotes the current estimate of the optimal average reward, is set to 0 along with T r and T t , which are also set to 0.…”
Section: Semi-markov Decision Problemsmentioning
confidence: 99%
“…For a more substantial introduction, see Puterman's book [33] or the survey paper by Arapostathis et al [1]. We consider an MDP with a countable state set X, a finite action set A, a nonnegative and bounded reward function R such that R : X × A → R + , and a state transition function P that maps the state and action pair to a probability distribution over X.…”
Section: Markov Decision Processmentioning
confidence: 99%
“…(2), obtaining a function h π and J π ∞ for π. Note that the function h π that satisfies the Poisson's equation with respect to π is not necessarily unique [1,33]. Under Assumption 2.1, the following function known as the "relative value function"…”
Section: Parallel Rolloutmentioning
confidence: 99%
See 1 more Smart Citation