Abstract. In the domain of decentralized Markov decision processes, we develop the first complete and optimal algorithm that is able to extract deterministic policy vectors based on finite state controllers for a cooperative team of agents. Our algorithm applies to the discounted infinite horizon case and extends best-first search methods to the domain of decentralized control theory. We prove the optimality of our approach and give some first experimental results for two small test problems. We believe this to be an important step forward in learning and planning in stochastic multi-agent systems.
In the following paper we present a new algorithm for cooperative reinforcement learning in multi-agent systems. We consider autonomous and independently learning agents, and we seek to obtain an optimal solution for the team as a whole while keeping the learning as much decentralized as possible. Coordination between agents occurs through communication, namely the mutual notification algorithm.We define the learning problem as a decentralized process using the MDP formalism. We then give an optimality criterion and prove the convergence of the algorithm for deterministic environments. We introduce variable and hierarchical communication strategies which considerably reduce the number of communications. Finally we study the convergence properties and communication overhead on a small example.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.