Collaborative robots or co-bots are a category of robots that are designed to work together with humans. By combining the strength of the robot such as precision and strength with the dexterity and problem-solving ability of the human, it is possible to achieve tasks that cannot be fully automated and improve the production quality and working conditions of workers. This paper presents the results of the ClaXon project which aims to study and implement interactions between humans and collaborative robots in factories. The project has led to the integration of a co-bot in the car manufacturing production plant of Audi Brussels in Belgium. Proofs of concepts were realized to study multimodal perceptions for human-robot interaction. The project addressed technical challenges regarding the introduction of collaborative robots on the factory floor. Social experiments were conducted with factory workers to assess the social acceptance of co-bots and study the interactions between the human and the robot.
This paper presents a novel audio visual diviseme (viseme pair) instance selection and concatenation method for speech driven photo realistic mouth animation. Firstly, an audio visual diviseme database is built consisting of the audio feature sequences, intensity sequences and visual feature sequences of the instances. In the Viterbi based diviseme instance selection, we set the accumulative cost as the weighted sum of three items: 1) logarithm of concatenation smoothness of the synthesized mouth trajectory; 2) logarithm of the pronunciation distance; 3) logarithm of the audio intensity distance between the candidate diviseme instance and the target diviseme segment in the incoming speech. The selected diviseme instances are time warped and blended to construct the mouth animation. Objective and subjective evaluations on the synthesized mouth animations prove that the multimodal diviseme instance selection algorithm proposed in this paper outperforms the triphone unit selection algorithm in Video Rewrite. Clear, accurate, smooth mouth animations can be obtained matching well with the pronunciation and intensity changes in the incoming speech. Moreover, with the logarithm function in the accumulative cost, it is easy to set the weights to obtain optimal mouth animations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.