A survey of point-based POMDP solvers

Shani, Guy; Pineau, Joëlle; Kaplow, Robert

doi:10.1007/s10458-012-9200-2

Cited by 360 publications

(300 citation statements)

References 22 publications

Supporting

Mentioning

290

Contrasting

Order By: Relevance

“…For the purpose of this work, we are using MCVI as a method to solve factored MDPs and demonstrate our technique for refinement on a large problem. We leave the question of determining an optimal setOf States for MCVI as future work, though we note that this question has been extensively studied in point-based value iteration algorithms for POMDPs [14].…”

Section: Experimental Results On Diagnostic Problemsmentioning

confidence: 99%

Iterative Model Refinement of Recommender MDPs Based on Expert Feedback

Khan

Poupart

Agosta

2013

Advanced Information Systems Engineering

View full text Add to dashboard Cite

Abstract. In this paper, we present a method to iteratively refine the parameters of a Markov Decision Process by leveraging constraints implied from an expert's review of the policy. We impose a constraint on the parameters of the model for every case where the expert's recommendation differs from the recommendation of the policy. We demonstrate that consistency with an expert's feedback leads to non-convex constraints on the model parameters. We refine the parameters of the model, under these constraints, by partitioning the parameter space and iteratively applying alternating optimization. We demonstrate how the approach can be applied to both flat and factored MDPs and present results based on diagnostic sessions from a manufacturing scenario.

show abstract

Section: Experimental Results On Diagnostic Problemsmentioning

confidence: 99%

Iterative Model Refinement of Recommender MDPs Based on Expert Feedback

Khan

Poupart

Agosta

2013

Advanced Information Systems Engineering

View full text Add to dashboard Cite

show abstract

“…Like MDPs [5] and POMDPs [8], [10], dynamic programming methods have been used in the context of DecPOMDPs [38]. Here, a set of T -step policy trees, one for each agent, is generated from the bottom up.…”

Section: A Optimal Approachesmentioning

confidence: 99%

“…Since then, several solution strategies that focus on the efficiency and feasibility of obtaining a solution have been explored for POMDPs in the AI community [8]- [10]. also been tackled in the control systems literature.…”

Section: Introductionmentioning

confidence: 99%

Decentralized control of partially observable Markov decision processes

Amato¹,

Chowdhary

Geramifard

et al. 2013

52nd IEEE Conference on Decision and Control

View full text Add to dashboard Cite

Abstract-Markov decision processes (MDPs) are often used to model sequential decision problems involving uncertainty under the assumption of centralized control. However, many large, distributed systems do not permit centralized control due to communication limitations (such as cost, latency or corruption). This paper surveys recent work on decentralized control of MDPs in which control of each agent depends on a partial view of the world. We focus on a general framework where there may be uncertainty about the state of the environment, represented as a decentralized partially observable MDP (Dec-POMDP), but consider a number of subclasses with different assumptions about uncertainty and agent independence. In these models, a shared objective function is used, but plans of action must be based on a partial view of the environment. We describe the frameworks, along with the complexity of optimal control and important properties. We also provide an overview of exact and approximate solution methods as well as relevant applications. This survey provides an introduction to what has become an active area of research on these models and their solutions.

show abstract

“…Most research has focused on determining the best set of http://www.jrobio.com/content/1/1/8 belief points [6][7][8] to be evaluated in VI. These methods rely on exploratory/search heuristics to discover a sufficient set of probability densities or sample points to be able to construct a sufficiently accurate approximation of the belief space such that an optimal policy can be found (see [9] for a detailed review on PBVI algorithms).…”

Section: Acting Under Partial Observabilitymentioning

confidence: 99%

Learning search polices from humans in a partially observable context

Chambrier

Billard

2014

Robot. Biomim.

View full text Add to dashboard Cite

Decision making and planning for which the state information is only partially available is a problem faced by all forms of intelligent entities they being either virtual, synthetic or biological. The standard approach to mathematically solve such a decisional problem is to formulate it as a partially observable decision process (POMDP) and apply the same optimisation techniques used in the Markov decision process (MDP). However, applying naively the same methodology to solve MDPs as with POMDPs makes the problem computationally intractable. To address this problem, we take a programming by demonstration approach to provide a solution to the POMDP in continuous state and action space. In this work, we model the decision making process followed by humans when searching blindly for an object on a table. We show that by representing the belief of the human's position in the environment by a particle filter (PF) and learning a mapping from this belief to their end effector velocities with a Gaussian mixture model (GMM), we can model the human's search process and reproduce it for any agent. We further categorize the type of behaviours demonstrated by humans as being either risk-prone or risk-averse and find that more than 70% of the human searches were considered to be risk-averse. We contrast the performance of this human-inspired search model with respect to greedy and coastal navigation search methods. Our evaluation metric is the distance taken to reach the goal and how each method minimises the uncertainty. We further analyse the control policy of the coastal navigation and GMM search models and argue that taking into account uncertainty is more efficient with respect to distance travelled to reach the goal.

show abstract

A survey of point-based POMDP solvers

Cited by 360 publications

References 22 publications

Iterative Model Refinement of Recommender MDPs Based on Expert Feedback

Iterative Model Refinement of Recommender MDPs Based on Expert Feedback

Decentralized control of partially observable Markov decision processes

Learning search polices from humans in a partially observable context

Contact Info

Product

Resources

About