A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning

Geramifard, Alborz; Walsh, Thomas J.; Tellex, Stefanie; Chowdhary, Girish; Roy, Nicholas; How, Jonathan P.

doi:10.1561/2200000042

Cited by 97 publications

(70 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are many choices for removing constraints as shown in the next example. 6 }}, which also obeys the threshold.…”

Section: Approximating the Polyhedra Joinmentioning

confidence: 79%

“…Typically the size of the state space is so large that it is not feasible to explicitly compute the Q-function for each state-action pair and thus the function is approximated. In this paper, we consider a linear function approximation of the Q-function for three reasons: (i) effectiveness: the approach is efficient, can handle large state spaces, and works well in practice [6]; (ii) it leverages our application domain: in our setting, it is possible to choose meaningful features (e.g., approximation of volume and cost of transformer) that relate to precision and performance of the static analysis and thus it is not necessary to uncover them automatically (as done, e.g., by training a neural net); and (iii) interpretability of policy: once the Q-function and associated policy are learned they can be inspected and interpreted. The Q-function is described as a linear combination of basis functions φ i : S × A → R, i = 1, .…”

Section: Algorithm 1 Q-learning Algorithmmentioning

confidence: 99%

See 1 more Smart Citation

Fast Numerical Program Analysis with Reinforcement Learning

Singh

Püschel

Vechev

2018

Computer Aided Verification

View full text Add to dashboard Cite

Abstract. We show how to leverage reinforcement learning (RL) in order to speed up static program analysis. The key insight is to establish a correspondence between concepts in RL and those in analysis: a state in RL maps to an abstract program state in analysis, an action maps to an abstract transformer, and at every state, we have a set of sound transformers (actions) that represent different trade-offs between precision and performance. At each iteration, the agent (analysis) uses a policy learned offline by RL to decide on the transformer which minimizes loss of precision at fixpoint while improving analysis performance. Our approach leverages the idea of online decomposition (applicable to popular numerical abstract domains) to define a space of new approximate transformers with varying degrees of precision and performance. Using a suitably designed set of features that capture key properties of abstract program states and available actions, we then apply Q-learning with linear function approximation to compute an optimized context-sensitive policy that chooses transformers during analysis. We implemented our approach for the notoriously expensive Polyhedra domain and evaluated it on a set of Linux device drivers that are expensive to analyze. The results show that our approach can yield massive speedups of up to two orders of magnitude while maintaining precision at fixpoint.

show abstract

“…There are many choices for removing constraints as shown in the next example. 6 }}, which also obeys the threshold.…”

Section: Approximating the Polyhedra Joinmentioning

confidence: 79%

Section: Algorithm 1 Q-learning Algorithmmentioning

confidence: 99%

Fast Numerical Program Analysis with Reinforcement Learning

Singh

Püschel

Vechev

2018

Computer Aided Verification

View full text Add to dashboard Cite

show abstract

“…Another contribution is an application of the RL to building control problems with occupant interactions based on the developed DP. RL 1 is a family of unsupervised learning schemes for agents interacting with unknown environment, and has been widely studied in [29][30][31][32][33][34]. We assume that a stochastic model of occupant behavior is given and present illustrative scenarios where RL can be applied to building control systems with occupant interactions, assessing potential of RL in those cases.…”

Section: Statement Of Contributionsmentioning

confidence: 99%

Approximate Dynamic Programming for Building Control Problems with Occupant Interactions

Lee

Karava

et al. 2018

2018 Annual American Control Conference (ACC)

View full text Add to dashboard Cite

The goal of this paper is to study potential applicability and performance of approximate dynamic programming (ADP) for building control problems. It is well known that occupants' stochastic behavior affects the thermal dynamics of building spaces. Incorporating occupant interactions in building control system designs is the main focus of this work. We apply ADP to stochastic optimal control designs for illustrative scenarios of occupant-building interactions and demonstrate its validity through a simulation study.

show abstract

“…where c l = c − 0.5w, c r = c + 0.5w (24) and m = 1 1 + e −0.5sw − 1 1 + e 0.5sw (25) Recall that the fuel consumption uncertainty is modeled as a random variable w v ∼ N (0, Q v ). Accordingly,…”

Section: Wind Modelmentioning

confidence: 99%

“…In the perfect state information case, Ure et. al [10] have developed a health-aware planning framework, which combines the trajectory sampling approximate MDP solvers [24] and learning based adaptive control [25]. This work aims to extend the health-aware planning methods to partially-observable domains to handle challenging tasks such as persistent package delivery in a partially-observable setting with health dynamics.…”

Section: Introductionmentioning

confidence: 99%

Health aware stochastic planning for persistent package delivery missions using quadrotors

Agha–mohammadi

Üre

How

et al. 2014

2014 IEEE/RSJ International Conference on Intelligent Robots and Systems

View full text Add to dashboard Cite

In persistent missions, taking system's health and capability degradation into account is an essential factor to predict and avoid failures. The state space in health-aware planning problems is often a mixture of continuous vehicle-level and discrete mission-level states. This in particular poses a challenge when the mission domain is partially observable and restricts the use of computationally expensive forward search methods. This paper presents a method that exploits a structure that exists in many health-aware planning problems and perform a two-layer planning scheme. The lower layer exploits the local linearization and Gaussian distribution assumption over vehicle level states while the higher level maintains a non-Gaussian distribution over discrete mission-level variables. This two-layer planning scheme allows us to limit the expensive online forward search to the mission-level states, and thus predict system's behavior over longer horizons in the future. We demonstrate the performance of the method on a long duration package delivery mission using a quadrotor in a partially-observable domain in the presence of constraints and health/capability degradation.

show abstract

A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning

Cited by 97 publications

References 57 publications

Fast Numerical Program Analysis with Reinforcement Learning

Fast Numerical Program Analysis with Reinforcement Learning

Approximate Dynamic Programming for Building Control Problems with Occupant Interactions

Health aware stochastic planning for persistent package delivery missions using quadrotors

Contact Info

Product

Resources

About