In the light of factories of the future, to ensure productive and safe interaction between robot and human coworkers, it is imperative that the robot extracts the essential information of the coworker. We address this by designing a reliable framework for real-time safe human-robot collaboration, using static hand gestures and 3D skeleton extraction. OpenPose library is integrated with Microsoft Kinect V2, to obtain a 3D estimation of the human skeleton. With the help of 10 volunteers, we recorded an image dataset of alphanumeric static hand gestures, taken from the American Sign Language. We named our dataset OpenSign and released it to the community for benchmarking. Inception V3 convolutional
This paper introduces BAZAR, a collaborative robot that integrates the most advanced sensing and actuating devices in a unique system designed for the Industry 4.0. We present BAZAR's three main features, which are all paramount in the factory of the future. These features are: mobility for navigating in dynamic environments, interaction for operating side-by-side with human workers and dual arm manipulation for transporting and assembling bulky objects. Keywords Efficient, flexible and modular production • Robotics • Smart Assembly • Human-robot co-working • Real industrial world case studies • Digital Manufacturing and Assembly System • Machine Learning.
Intuitive user interfaces are indispensable to interact with the human centric smart environments. In this paper, we propose a unified framework that recognizes both static and dynamic gestures, using simple RGB vision (without depth sensing). This feature makes it suitable for inexpensive human-robot interaction in social or industrial settings. We employ a pose-driven spatial attention strategy, which guides our proposed Static and Dynamic gestures Network—StaDNet. From the image of the human upper body, we estimate his/her depth, along with the region-of-interest around his/her hands. The Convolutional Neural Network (CNN) in StaDNet is fine-tuned on a background-substituted hand gestures dataset. It is utilized to detect 10 static gestures for each hand as well as to obtain the hand image-embeddings. These are subsequently fused with the augmented pose vector and then passed to the stacked Long Short-Term Memory blocks. Thus, human-centred frame-wise information from the augmented pose vector and from the left/right hands image-embeddings are aggregated in time to predict the dynamic gestures of the performing person. In a number of experiments, we show that the proposed approach surpasses the state-of-the-art results on the large-scale Chalearn 2016 dataset. Moreover, we transfer the knowledge learned through the proposed methodology to the Praxis gestures dataset, and the obtained results also outscore the state-of-the-art on this dataset.
International audienceIn this paper, we present the design of a new camera combining both predator-like and prey-like vision features. This setup provides both a spherical RGB-view and a directional depth-view of the environment. The model and calibration of the full setup are described. A few examples will be given to demonstrate the interest and the versatility of such camera for robotics and video surveillance at the oral presentation
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.