Robotics: Science and Systems X 2014
DOI: 10.15607/rss.2014.x.039
|View full text |Cite
|
Sign up to set email alerts
|

Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints

Abstract: Abstract-We consider synthesis of controllers that maximize the probability of satisfying given temporal logic specifications in unknown, stochastic environments. We model the interaction between the system and its environment as a Markov decision process (MDP) with initially unknown transition probabilities. The solution we develop builds on the so-called model-based probably approximately correct Markov decision process (PAC-MDP) method. The algorithm attains an ε-approximately optimal policy with probabilit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
107
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 110 publications
(107 citation statements)
references
References 22 publications
(25 reference statements)
0
107
0
Order By: Relevance
“…The underlying task is to learn these probabilities and compute a policy that maximizes the probability of reaching the target. We use a PAC-MDP learning algorithm similar to that shown in [Fu and Topcu, 2014]. In order to mitigate the high sampling requirement and learning time mentioned earlier, we apply the proposed reduction technique to reduce the distributions that need to be sampled without sacrificing the PAC guarantees.…”
Section: Reductions In Gridworlds With Ltl Objectivementioning
confidence: 99%
See 2 more Smart Citations
“…The underlying task is to learn these probabilities and compute a policy that maximizes the probability of reaching the target. We use a PAC-MDP learning algorithm similar to that shown in [Fu and Topcu, 2014]. In order to mitigate the high sampling requirement and learning time mentioned earlier, we apply the proposed reduction technique to reduce the distributions that need to be sampled without sacrificing the PAC guarantees.…”
Section: Reductions In Gridworlds With Ltl Objectivementioning
confidence: 99%
“…PAC learning. We now run a modified version of the R-max learning algorithm presented in [Fu and Topcu, 2014] on one of the reduced 10 × 10 MDPs. Explicitly, we aim to learn a policy that with probability at least 1 − δ will be ε-optimal in maximizing reachability probability.…”
Section: Learning In Gridworlds With Ltl Objectivesmentioning
confidence: 99%
See 1 more Smart Citation
“…Safe or constrained (e.g., by temporal logic specifications) exploration has also been investigated in the learning literature. Some recent examples include [13,14]. An overview on safe exploration using reinforcement learning can be found in [15].…”
Section: Introductionmentioning
confidence: 99%
“…For instance, [42] focuses on fusing human and machine "perception." Likewise, attempts to blend human and machine "decision making" occur in the machine learning [60], [36], control theory [29], and human robot interaction literature [32]. A special case of shared decision making is shared control: fuse human and robot platform commands.…”
Section: Introductionmentioning
confidence: 99%