Social robotics is an emerging area that is becoming present in social spaces, by introducing autonomous social robots. Social robots offer services, perform tasks, and interact with people in such social environments, demanding more efficient and complex Human–Robot Interaction (HRI) designs. A strategy to improve HRI is to provide robots with the capacity of detecting the emotions of the people around them to plan a trajectory, modify their behaviour, and generate an appropriate interaction with people based on the analysed information. However, in social environments in which it is common to find a group of persons, new approaches are needed in order to make robots able to recognise groups of people and the emotion of the groups, which can be also associated with a scene in which the group is participating. Some existing studies are focused on detecting group cohesion and the recognition of group emotions; nevertheless, these works do not focus on performing the recognition tasks from a robocentric perspective, considering the sensory capacity of robots. In this context, a system to recognise scenes in terms of groups of people, to then detect global (prevailing) emotions in a scene, is presented. The approach proposed to visualise and recognise emotions in typical HRI is based on the face size of people recognised by the robot during its navigation (face sizes decrease when the robot moves away from a group of people). On each frame of the video stream of the visual sensor, individual emotions are recognised based on the Visual Geometry Group (VGG) neural network pre-trained to recognise faces (VGGFace); then, to detect the emotion of the frame, individual emotions are aggregated with a fusion method, and consequently, to detect global (prevalent) emotion in the scene (group of people), the emotions of its constituent frames are also aggregated. Additionally, this work proposes a strategy to create datasets with images/videos in order to validate the estimation of emotions in scenes and personal emotions. Both datasets are generated in a simulated environment based on the Robot Operating System (ROS) from videos captured by robots through their sensory capabilities. Tests are performed in two simulated environments in ROS/Gazebo: a museum and a cafeteria. Results show that the accuracy in the detection of individual emotions is 99.79% and the detection of group emotion (scene emotion) in each frame is 90.84% and 89.78% in the cafeteria and the museum scenarios, respectively.
Social robotics is an emerging area that foster the integration of robots and humans in the same environment. With this objective, robots include capacities such as the detection of emotions in people to be able to plan their trajectory, modify their behavior, and generate a positive interaction with people based on the information analyzed. Several algorithms developed for robots to accomplish different tasks, such as people recognition, tracking, emotion detection, demonstrate empathy, need large and reliable datasets to evaluate their effectiveness and efficiency. Most existing datasets do not consider the first-person perspective from the sensory capacity of robots, but third-person perspective from out of the robot cameras. In this context, we propose an approach to create datasets with a robot-centric perspective. Based on the proposed approach, we made up a dataset with 23,222 images and 24 videos, recorded from the sensory capacity of a Pepper robot in simulated environments. This dataset is used to recognize individual and group emotions. We develop two virtual environments (a cafeteria and a museum), where there are people alone and in groups, expressing different emotions, who are then captured from the point of view of the Pepper robot. We labeled the database using the Viola-Jones algorithm for face detection, classifying individual emotions into six types: happy, neutral, sad, disgust, fear, and anger. Based on the group emotions observed by the robot, the videos were classified into three emotions: positive, negative, and neutral. To show the suitability and utility of the dataset, we train and evaluate the VGG-Face network. The efficiency achieved by this algorithm was 99% in the recognition of individual emotions and the detection of group emotions is 90.84% and 89.78% in the cafeteria and museum scenarios, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.