For social robots, knowledge regarding human emotional states is an essential part of adapting their behavior or associating emotions to other entities. Robots gather the information from which emotion detection is processed via different media, such as text, speech, images, or videos. The multimedia content is then properly processed to recognize emotions/sentiments, for example, by analyzing faces and postures in images/videos based on machine learning techniques or by converting speech into text to perform emotion detection with natural language processing (NLP) techniques. Keeping this information in semantic repositories offers a wide range of possibilities for implementing smart applications. We propose a framework to allow social robots to detect emotions and to store this information in a semantic repository, based on EMONTO (an EMotion ONTOlogy), and in the first figure or table caption. Please define if appropriate. an ontology to represent emotions. As a proof-of-concept, we develop a first version of this framework focused on emotion detection in text, which can be obtained directly as text or by converting speech to text. We tested the implementation with a case study of tour-guide robots for museums that rely on a speech-to-text converter based on the Google Application Programming Interface (API) and a Python library, a neural network to label the emotions in texts based on NLP transformers, and EMONTO integrated with an ontology for museums; thus, it is possible to register the emotions that artworks produce in visitors. We evaluate the classification model, obtaining equivalent results compared with a state-of-the-art transformer-based model and with a clear roadmap for improvement.
Emotion recognition is a strategy for social robots used to implement better Human-Robot Interaction and model their social behaviour. Since human emotions can be expressed in different ways (e.g., face, gesture, voice), multimodal approaches are useful to support the recognition process. However, although there exist studies dealing with multimodal emotion recognition for social robots, they still present limitations in the fusion process, dropping their performance if one or more modalities are not present or if modalities have different qualities. This is a common situation in social robotics, due to the high variety of the sensory capacities of robots; hence, more flexible multimodal models are needed. In this context, we propose an adaptive and flexible emotion recognition architecture able to work with multiple sources and modalities of information and manage different levels of data quality and missing data, to lead robots to better understand the mood of people in a given environment and accordingly adapt their behaviour. Each modality is analyzed independently to then aggregate the partial results with a previous proposed fusion method, called EmbraceNet+, which is adapted and integrated to our proposed framework. We also present an extensive review of state-of-the-art studies dealing with fusion methods for multimodal emotion recognition approaches. We evaluate the performance of our proposed architecture by performing different tests in which several modalities are combined to classify emotions using four categories (i.e., happiness, neutral, sadness, and anger). Results reveal that our approach is able to adapt to the quality and presence of modalities. Furthermore, results obtained are validated and compared with other similar proposals, obtaining competitive performance with state-of-the-art models.
Nowadays, mobile robots are playing an important role in different areas of science, industry, academia and even in everyday life. In this sense, their abilities and behaviours become increasingly complex. In particular, in indoor environments, such as hospitals, schools, banks and museums, where the robot coincides with people and other robots, its movement and navigation must be programmed and adapted to robot–robot and human–robot interactions. However, existing approaches are focused either on multi-robot navigation (robot–robot interaction) or social navigation with human presence (human–robot interaction), neglecting the integration of both approaches. Proxemic interaction is recently being used in this domain of research, to improve Human–Robot Interaction (HRI). In this context, we propose an autonomous navigation approach for mobile robots in indoor environments, based on the principles of proxemic theory, integrated with classical navigation algorithms, such as ORCA, Social Momentum, and A*. With this novel approach, the mobile robot adapts its behaviour, by analysing the proximity of people to each other, with respect to it, and with respect to other robots to decide and plan its respective navigation, while showing acceptable social behaviours in presence of humans. We describe our proposed approach and show how proxemics and the classical navigation algorithms are combined to provide an effective navigation, while respecting social human distances. To show the suitability of our approach, we simulate several situations of coexistence of robots and humans, demonstrating an effective social navigation.
Human emotion recognition from visual expressions is an important research area in computer vision and machine learning owing to its significant scientific and commercial potential. Since visual expressions can be captured from different modalities (e.g., face expressions, body posture, hands pose), multi-modal methods are becoming popular for analyzing human reactions. In contexts in which human emotion detection is performed to associate emotions to certain events or objects to support decision making or for further analysis, it is useful to keep this information in semantic repositories, which offers a wide range of possibilities for implementing smart applications. We propose a multi-modal method for human emotion recognition and an ontology-based approach to store the classification results in EMONTO, an extensible ontology to model emotions. The multi-modal method analyzes facial expressions, body gestures, and features from the body and the environment to determine an emotional state; this processes each modality with a specialized deep learning model and applying a fusion method. Our fusion method, called EmbraceNet+, consists of a branched architecture that integrates the EmbraceNet fusion method with other ones. We experimentally evaluate our multi-modal method on an adaptation of the EMOTIC dataset. Results show that our method outperforms the single-modal methods.
Social robotics is an emerging area that is becoming present in social spaces, by introducing autonomous social robots. Social robots offer services, perform tasks, and interact with people in such social environments, demanding more efficient and complex Human–Robot Interaction (HRI) designs. A strategy to improve HRI is to provide robots with the capacity of detecting the emotions of the people around them to plan a trajectory, modify their behaviour, and generate an appropriate interaction with people based on the analysed information. However, in social environments in which it is common to find a group of persons, new approaches are needed in order to make robots able to recognise groups of people and the emotion of the groups, which can be also associated with a scene in which the group is participating. Some existing studies are focused on detecting group cohesion and the recognition of group emotions; nevertheless, these works do not focus on performing the recognition tasks from a robocentric perspective, considering the sensory capacity of robots. In this context, a system to recognise scenes in terms of groups of people, to then detect global (prevailing) emotions in a scene, is presented. The approach proposed to visualise and recognise emotions in typical HRI is based on the face size of people recognised by the robot during its navigation (face sizes decrease when the robot moves away from a group of people). On each frame of the video stream of the visual sensor, individual emotions are recognised based on the Visual Geometry Group (VGG) neural network pre-trained to recognise faces (VGGFace); then, to detect the emotion of the frame, individual emotions are aggregated with a fusion method, and consequently, to detect global (prevalent) emotion in the scene (group of people), the emotions of its constituent frames are also aggregated. Additionally, this work proposes a strategy to create datasets with images/videos in order to validate the estimation of emotions in scenes and personal emotions. Both datasets are generated in a simulated environment based on the Robot Operating System (ROS) from videos captured by robots through their sensory capabilities. Tests are performed in two simulated environments in ROS/Gazebo: a museum and a cafeteria. Results show that the accuracy in the detection of individual emotions is 99.79% and the detection of group emotion (scene emotion) in each frame is 90.84% and 89.78% in the cafeteria and the museum scenarios, respectively.
The coexistence of service robots in social environments has been intensified in recent years, demanding Human-Robot Interaction (HRI) increasingly fluid and necessary. In this context, the present work aims to develop an architecture, called Erika. Erika provides a chatbot to interact by voice and text commands with a service robot, who implements an autonomous navigation, respecting social restrictions based on proxemic zones. An API is available in Erika for connecting the chatbot with a website application, that in turn establishes communication with the Robot Operating System (ROS) to perform simulated experiments and test the functionality of the chatbot and the social-aware navigation of the robot. Results demonstrate that the service robot can respond to the commands provided through the chatbot and , so that, finally, it can drive autonomous or manual navigation, when necessary.
Many authors have been working on approaches that can be applied to social robots to allow a more realistic/comfortable relationship between humans and robots in the same space. This paper proposes a new navigation strategy for social environments by recognizing and considering the social conventions of people and groups. To achieve that, we proposed the application of Delaunay triangulation for connecting people as vertices of a triangle network. Then, we defined a complete asymmetric Gaussian function (for individuals and groups) to decide zones where the robot must avoid passing. Furthermore, a feature generalization scheme called socialization feature was proposed to incorporate perception information that can be used to change the variance of the Gaussian function. Simulation results have been presented to demonstrate that the proposed approach can modify the path according to the perception of the robot compared to a standard A* algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.