Due to its ability to learn complex behaviors in high-dimensional state-action spaces, deep reinforcement learning algorithms have attracted much interest in the robotics community. For a practical reinforcement learning implementation, reward signals have to be informative in the sense they have to discriminate certain close states and they must not be too noisy. To address the first issue, prior information, e.g. in the form of a geometric model, or human supervision are often assumed. This paper proposes a method to learn binocular fixations without such prior information. Instead, it uses an informative reward requiring little supervised information. The reward computation is based on an anomaly detection mechanism which uses convolutional autoencoders. These detectors estimate in a weakly supervised way an object's pixellic position. This position estimate is affected by noise, which makes the reward signal noisy. We first show that this affects both the learning speed and the resulting policy. Then, we propose a method to partially remove the noise using regression on the detection change given sensor data. The binocular fixation task is learned in a simulated environment on an object training set with various shapes and colors. The learned policy is compared with another one learned with a highly informative and noiseless reward signal. The tests are carried out on the training set and on a test set of new objects. We observe similar performances, showing that the environmentencoding step can replace the prior information.
In some manipulation robotics environments, because of the difficulty of precisely modeling dynamics and computing features which describe well the variety of scene appearances, hand-programming a robot behavior is often intractable. Deep reinforcement learning methods partially alleviate this problem in that they can dispense with hand-crafted features for the state representation and do not need pre-computed dynamics. However, they often use prior information in the task definition in the form of shaping rewards which guide the robot toward goal state areas but require engineering or human supervision and can lead to sub-optimal behavior. In this work we consider a complex robot reaching task with a large range of initial object positions and initial arm positions and propose a new learning approach with minimal supervision. Inspired by developmental robotics, our method consists of a weakly-supervised stage-wise procedure of three tasks. First, the robot learns to fixate the object with a 2-camera system. Second, it learns hand-eye coordination by learning to fixate its end-effector. Third, using the knowledge acquired in the previous steps, it learns to reach the object at different positions and from a large set of initial robot joint angles. Experiments in a simulated environment show that our stage-wise framework yields similar reaching performances, compared with a supervised setting without using kinematic models, hand-crafted features, calibration parameters or supervised visual modules.
Human infants learn to manipulate objects in a largely autonomous fashion, starting without precise models of their bodies' kinematics and dynamics. Replicating such learning abilities in robots would make them more flexible and robust and is considered a grand challenge of Developmental Robotics. In this paper, we propose a developmental method that allows a robot to learn to touch an object, while also learning to predict if the object is within reach or not. Importantly, our method does not rely on any forward or inverse kinematics models. Instead it uses a stage-wise learning approach combining deep reinforcement learning and a form of self-supervised learning. In this approach, complex skills such as touching an object or predicting if it is within reach are learned on top of more basic skills such as object fixation and eye-hand-coordination.Index Terms-Sensori-motor learning, reinforcement learning, object reachability This work was funded by the French government research program "Investissements d'avenir" through the IMobS3 Laboratory of Excellence (ANR-10-LABX-16-01), by the European Union through the Horizon 2020 Research and Innovation Programme under grant agreement no. 713010 (GOAL-Robots Goal-based Open-ended Autonomous Learning Robots) and through the program Regional competitiveness and employment (ERDF Auvergne region), and by the Auvergne region. J. Triesch acknowledges support from the Quandt foundation.
Learning complex behaviors through reinforcement learning is particularly challenging when reward is only available upon successful completion of the full behavior. In manipulation robotics, so-called shaping rewards are often used to overcome this problem. However, these usually require human engineering or (partial) world models describing, e.g., the kinematics of the robot or high-level modules for perception. Here we propose an alternative method to learn an object palm-touching task through a weakly-supervised and stagewise learning of simpler tasks. First, the robot learns to fixate the object with its cameras. Second, the robot learns eye-hand coordination by learning to fixate its end effector. Third, using the previously acquired skills an informative shaping reward can be computed which facilitates efficient learning of the object palm-touching task. We demonstrate in simulation that learning the full task with this shaping reward is comparable to learning with an informative supervised reward.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.