This paper presents a novel class of algorithms, called Heuristically-Accelerated Multiagent Reinforcement Learning (HAMRL), which allows the use of heuristics to speed up well-known multiagent reinforcement learning (RL) algorithms such as the Minimax-Q. Such HAMRL algorithms are characterized by a heuristic function, which suggests the selection of particular actions over others. This function represents an initial action selection policy, which can be handcrafted, extracted from previous experience in distinct domains, or learnt from observation. To validate the proposal, a thorough theoretical analysis proving the convergence of four algorithms from the HAMRL class (HAMMQ, HAMQ(λ), HAMQS, and HAMS) is presented. In addition, a comprehensive systematical evaluation was conducted in two distinct adversarial domains. The results show that even the most straightforward heuristics can produce virtually optimal action selection policies in much fewer episodes, significantly improving the performance of the HAMRL over vanilla RL algorithms.
We present a method for fast training of vision based control policies on real robots. The key idea behind our method is to perform multi-task Reinforcement Learning with auxiliary tasks that differ not only in the reward to be optimized but also in the state-space in which they operate. In particular, we allow auxiliary task policies to utilize task features that are available only at training-time. This allows for fast learning of auxiliary policies, which subsequently generate good data for training the main, vision-based control policies. This method can be seen as an extension of the Scheduled Auxiliary Control (SAC-X) framework. We demonstrate the efficacy of our method by using both a simulated and real-world Ball-in-a-Cup game controlled by a robot arm. In simulation, our approach leads to significant learning speed-ups when compared to standard SAC-X. On the real robot we show that the task can be learned from-scratch, i.e., with no transfer from simulation and no imitation learning. Videos of our learned policies running on the real robot can be found at https
This is the author accepted manuscript. The final version is available from [publisher] via http://dx.doi.org/doi:10.1080/0952813X.2015.1132265Spatial knowledge plays an essential role in human reasoning, permitting tasks such as locating objects in the world (including oneself), reasoning about everyday actions and describing perceptual information. This is also the case in the field of mobile robotics, where one of the most basic (and essential) tasks is the autonomous determination of the pose of a robot with respect to a map, given its perception of the environment. This is the problem of robot self-localisation (or simply the localisation problem). This paper presents a probabilistic algorithm for robot self-localisation that is based on a topological map constructed from the observation of spatial occlusion. Distinct locations on the map are defined by means of a classical formalism for qualitative spatial reasoning, whose base definitions are closer to the human categorisation of space than traditional, numerical, localisation procedures. The approach herein proposed was systematically evaluated through experiments using a mobile robot equipped with a RGB-D sensor. The results obtained show that the localisation algorithm is successful in locating the robot in qualitatively distinct regions.authorsversionPeer reviewe
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.