Touch can have a strong effect on interactions between people, and as such, it is expected to be important to the interactions people have with robots. In an earlier work, we showed that the intensity of tactile interaction with a robot can change how much people are willing to take risks. This study further develops our understanding of the relationship between human risk-taking behaviour, the physiological responses by the user, and the intensity of the tactile interaction with a social robot. We used data collected with physiological sensors during the playing of a risk-taking game (the Balloon Analogue Risk Task, or BART). The results of a mixed-effects model were used as a baseline to predict risk-taking propensity from physiological measures, and these results were further improved through the use of two machine learning techniques—support vector regression (SVR) and multi-input convolutional multihead attention (MCMA)—to achieve low-latency risk-taking behaviour prediction during human–robot tactile interaction. The performance of the models was evaluated based on mean absolute error (MAE), root mean squared error (RMSE), and R squared score (R2), which obtained the optimal result with MCMA yielding an MAE of 3.17, an RMSE of 4.38, and an R2 of 0.93 compared with the baseline of 10.97 MAE, 14.73 RMSE, and 0.30 R2. The results of this study offer new insights into the interplay between physiological data and the intensity of risk-taking behaviour in predicting human risk-taking behaviour during human–robot tactile interactions. This work illustrates that physiological activation and the intensity of tactile interaction play a prominent role in risk processing during human–robot tactile interaction and demonstrates that it is feasible to use human physiological data and behavioural data to predict risk-taking behaviour in human–robot tactile interaction.
Spoken language interaction is at the heart of interpersonal communication, and people flexibly adapt their speech to different individuals and environments. It is surprising that robots, and by extension other digital devices, are not equipped to adapt their speech and instead rely on fixed speech parameters, which often hinder comprehension by the user. We conducted a speech comprehension study involving 39 participants who were exposed to different environmental and contextual conditions. During the experiment, the robot articulated words using different vocal parameters, and the participants were tasked with both recognising the spoken words and rating their subjective impression of the robot's speech. The experiment's primary outcome shows that spaces with good acoustic quality positively correlate with intelligibility and user experience. However, increasing the distance between the user and the robot exacerbated the user experience, while distracting background sounds significantly reduced speech recognition accuracy and user satisfaction. We next built an adaptive voice for the robot. For this, the robot needs to know how difficult it is for a user to understand spoken language in a particular setting. We present a prediction model that rates how annoying the ambient acoustic environment is and, consequentially, how hard it is to understand someone in this setting. Then, we develop a convolutional neural network model to adapt the robot's speech parameters to different users and spaces, while taking into account the influence of ambient acoustics on intelligibility. Finally, we present an evaluation with 27 users, demonstrating superior intelligibility and user experience with adaptive voice parameters compared to fixed voice.
Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel hierarchical graph representation learning (HGRL) approach which links objective audio events (AE) with subjective annoyance ratings (AR) of the soundscape perceived by humans. The hierarchical graph consists of fine-grained event (fAE) embeddings with single-class event semantics, coarse-grained event (cAE) embeddings with multi-class event semantics, and AR embeddings. Experiments show the proposed HGRL successfully integrates AE with AR for AEC and ARP tasks, while coordinating the relations between cAE and fAE and further aligning the two different grains of AE information with the AR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.