2013
DOI: 10.1561/2200000042
|View full text |Cite
|
Sign up to set email alerts
|

A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
68
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 97 publications
(70 citation statements)
references
References 57 publications
1
68
0
Order By: Relevance
“…There are many choices for removing constraints as shown in the next example. 6 }}, which also obeys the threshold.…”
Section: Approximating the Polyhedra Joinmentioning
confidence: 79%
See 1 more Smart Citation
“…There are many choices for removing constraints as shown in the next example. 6 }}, which also obeys the threshold.…”
Section: Approximating the Polyhedra Joinmentioning
confidence: 79%
“…Typically the size of the state space is so large that it is not feasible to explicitly compute the Q-function for each state-action pair and thus the function is approximated. In this paper, we consider a linear function approximation of the Q-function for three reasons: (i) effectiveness: the approach is efficient, can handle large state spaces, and works well in practice [6]; (ii) it leverages our application domain: in our setting, it is possible to choose meaningful features (e.g., approximation of volume and cost of transformer) that relate to precision and performance of the static analysis and thus it is not necessary to uncover them automatically (as done, e.g., by training a neural net); and (iii) interpretability of policy: once the Q-function and associated policy are learned they can be inspected and interpreted. The Q-function is described as a linear combination of basis functions φ i : S × A → R, i = 1, .…”
Section: Algorithm 1 Q-learning Algorithmmentioning
confidence: 99%
“…Another contribution is an application of the RL to building control problems with occupant interactions based on the developed DP. RL 1 is a family of unsupervised learning schemes for agents interacting with unknown environment, and has been widely studied in [29][30][31][32][33][34]. We assume that a stochastic model of occupant behavior is given and present illustrative scenarios where RL can be applied to building control systems with occupant interactions, assessing potential of RL in those cases.…”
Section: Statement Of Contributionsmentioning
confidence: 99%
“…where c l = c − 0.5w, c r = c + 0.5w (24) and m = 1 1 + e −0.5sw − 1 1 + e 0.5sw (25) Recall that the fuel consumption uncertainty is modeled as a random variable w v ∼ N (0, Q v ). Accordingly,…”
Section: Wind Modelmentioning
confidence: 99%
“…In the perfect state information case, Ure et. al [10] have developed a health-aware planning framework, which combines the trajectory sampling approximate MDP solvers [24] and learning based adaptive control [25]. This work aims to extend the health-aware planning methods to partially-observable domains to handle challenging tasks such as persistent package delivery in a partially-observable setting with health dynamics.…”
Section: Introductionmentioning
confidence: 99%