We propose a new benchmarking protocol to evaluate algorithms for bimanual robotic manipulation semi-deformable objects. The benchmark is inspired from two real-world applications: (a) watchmaking craftsmanship, and (b) belt assembly in automobile engines. We provide two setups that try to highlight the following challenges: (a) manipulating objects via a tool, (b) placing irregularly shaped objects in the correct groove, (c) handling semideformable objects, and (d) bimanual coordination. We provide CAD drawings of the task pieces that can be easily 3D printed to ensure ease of reproduction, and detailed description of tasks and protocol for successful reproduction, as well as meaningful metrics for comparison. We propose four categories of submission in an attempt to make the benchmark accessible to a wide range of related fields spanning from adaptive control, motion planning to learning the tasks through trial-and-error learning. Index Terms-Performance evaluation and benchmarking, dual arm manipulation, model learning for control, dexterous manipulation. I. INTRODUCTION A VARIETY of industrial tasks are still performed by humans today, as they require high-level precision and dexterity not yet available in robots. These tasks require the use of prehensile instruments, such as screwdrivers or tweezers, to grasp, insert, and manipulate tiny and deformable objects. Examples of such tasks are common in watchmaking craftsmanship, where both assembling and screwing are the core actions in the whole process, and in pharmaceutical industry, to handle pipettes and vials. There is interest to automatize parts of these tasks [1]. Such precise manipulation can also be
As robots perform manipulation tasks and interact with objects, it is probable that they accidentally drop objects that subsequently bounce out of their visual fields (e.g., due to an inadequate grasp of an unfamiliar object). To enable robots to recover from such errors, we draw upon the concept of object permanence-objects remain in existence even when they are not being sensed (e.g., seen) directly. In particular, we developed a multimodal neural network model-using a partial, observed bounce trajectory and the audio resulting from drop impact as its inputs-to predict the full bounce trajectory and the end location of a dropped object. We empirically show that: (1) our multimodal method predicted end locations close in proximity (i.e., within the visual field of the robot's wrist camera) to the actual locations and (2) the robot was able to retrieve dropped objects by applying minimal vision-based pick-up adjustments. Additionally, we show that our method outperformed five comparison baselines in retrieving dropped objects. Our results contribute to enabling object permanence for robots and error recovery from object drops.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.