A recurrent spiking neural network is proposed that implements planning as probabilistic inference for finite and infinite horizon tasks. The architecture splits this problem into two parts: The stochastic transient firing of the network embodies the dynamics of the planning task. With appropriate injected input this dynamics is shaped to generate high-reward state trajectories. A general class of reward-modulated plasticity rules for these afferent synapses is presented. The updates optimize the likelihood of getting a reward through a variant of an Expectation Maximization algorithm and learning is guaranteed to convergence to a local maximum. We find that the network dynamics are qualitatively similar to transient firing patterns during planning and foraging in the hippocampus of awake behaving rats. The model extends classical attractor models and provides a testable prediction on identifying modulating contextual information. In a real robot arm reaching and obstacle avoidance task the ability to represent multiple task solutions is investigated. The neural planning method with its local update rules provides the basis for future neuromorphic hardware implementations with promising potentials like large data processing abilities and early initiation of strategies to avoid dangerous situations in robot co-worker scenarios.
Abstract-In whole-body control, joint torques and external forces need to be estimated accurately. In principle, this can be done through pervasive joint-torque sensing and accurate system identification. However, these sensors are expensive and may not be integrated in all links. Moreover, the exact position of the contact must be known for a precise estimation. If contacts occur on the whole body, tactile sensors can estimate the contact location, but this requires a kinematic spatial calibration, which is prone to errors. Accumulating errors may have dramatic effects on the system identification. As an alternative to classical model-based approaches we propose a data-driven mixture-of-experts learning approach using Gaussian processes. This model predicts joint torques directly from raw data of tactile and force/torque sensors. We compare our approach to an analytic model-based approach on real world data recorded from the humanoid iCub. We show that the learned model accurately predicts the joint torques resulting from contact forces, is robust to changes in the environment and outperforms existing dynamic models that use of force/ torque sensor data.
Abstract-One of the key problems in planning and control of redundant robots is the fast generation of controls when multiple tasks and constraints need to be satisfied. In the literature, this problem is classically solved by multi-task prioritized approaches, where the priority of each task is determined by a weight function, describing the task strict/soft priority. In this paper, we propose to leverage machine learning techniques to learn the temporal profiles of the task priorities, represented as parametrized weight functions: we automatically determine their parameters through a stochastic optimization procedure. We show the effectiveness of the proposed method on a simulated 7 DOF Kuka LWR and both a simulated and a real Kinova Jaco arm. We compare the performance of our approach to a state-of-the-art method based on soft task prioritization, where the task weights are typically hand-tuned.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.