Abstract-Markov decision processes (MDPs) are often used to model sequential decision problems involving uncertainty under the assumption of centralized control. However, many large, distributed systems do not permit centralized control due to communication limitations (such as cost, latency or corruption). This paper surveys recent work on decentralized control of MDPs in which control of each agent depends on a partial view of the world. We focus on a general framework where there may be uncertainty about the state of the environment, represented as a decentralized partially observable MDP (Dec-POMDP), but consider a number of subclasses with different assumptions about uncertainty and agent independence. In these models, a shared objective function is used, but plans of action must be based on a partial view of the environment. We describe the frameworks, along with the complexity of optimal control and important properties. We also provide an overview of exact and approximate solution methods as well as relevant applications. This survey provides an introduction to what has become an active area of research on these models and their solutions.
Abstract-Real-world robots commonly have to act in complex, poorly understood environments where the true world dynamics are unknown. To compensate for the unknown world dynamics, we often provide a class of models to a learner so it may select a model, typically using a minimum prediction error metric over a set of training data. Often in realworld domains the model class is unable to capture the true dynamics, due to either limited domain knowledge or a desire to use a small model. In these cases we call the model class misspecified, and an unfortunate consequence of misspecification is that even with unlimited data and computation there is no guarantee the model with minimum prediction error leads to the best performing policy. In this work, our approach improves upon the standard maximum likelihood model selection metric by explicitly selecting the model which achieves the highest expected reward, rather than the most likely model. We present an algorithm for which the highest performing model from the model class is guaranteed to be found given unlimited data and computation. Empirically, we demonstrate that our algorithm is often superior to the maximum likelihood learner in a batch learning setting for two common RL benchmark problems and a third real-world system, the hydrodynamic cart-pole, a domain whose complex dynamics cannot be known exactly.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.