Researchers, industry, and practitioners are increasingly interested in the potential of social robots in education for learners on the autism spectrum. In this study, we conducted semi-structured interviews and focus groups with educators in England to gain their perspectives on the potential use of humanoid robots with autistic pupils, eliciting ideas, and specific examples of potential use. Understanding educator views is essential, because they are key decision-makers for the adoption of robots and would directly facilitate future use with pupils. Educators were provided with several example images (e.g., NAO, KASPAR, Milo), but did not directly interact with robots or receive information on current technical capabilities. The goal was for educators to respond to the general concept of humanoid robots as an educational tool, rather than to focus on the existing uses or behaviour of a particular robot. Thirty-one autism education staff participated, representing a range of special education settings and age groups as well as multiple professional roles (e.g., teachers, teaching assistants, speech, and language therapists). Thematic analysis of the interview transcripts identified four themes: Engagingness of robots, Predictability and consistency, Roles of robots in autism education, and Need for children to interact with people, not robots. Although almost all interviewees were receptive toward using humanoid robots in the classroom, they were not uncritically approving. Rather, they perceived future robot use as likely posing a series of complex cost-benefit trade-offs over time. For example, they felt that a highly motivating, predictable social robot might increase children's readiness to learn in the classroom, but it could also prevent children from engaging fully with other people or activities. Educator views also assumed that skills learned with a robot would generalise, and that robots' predictability is beneficial for autistic children-claims that need further supporting evidence. These interview results offer many points of guidance to the HRI research community about how humanoid robots could meet the specific needs of autistic learners, as well as identifying issues that will need to be resolved for robots to be both acceptable and successfully deployed in special education contexts.
We present design strategies for Human Robot Interaction for schoolaged autistic children with limited receptive language. Applying these strategies to the DE-ENIGMA project (large EU project addressing emotion recognition in autistic children) supported development of a new activity for in facial expression imitation whereby the robot imitates the child's face to encourage the child to notice facial expressions in a play-based game. A usability case study with 15 typically-developing children aged 4-6 at an English-language school in the Netherlands was performed to observe the feasibility of the setup and make design revisions before exposing the robot to autistic children.
Individuals with autism are known to face challenges with emotion regulation, and express their affective states in a variety of ways. With this in mind, an increasing amount of research on automatic affect recognition from speech and other modalities has recently been presented to assist and provide support, as well as to improve understanding of autistic individuals' behaviours. As well as the emotion expressed from the voice, for autistic children the dynamics of verbal speech can be inconsistent and vary greatly amongst individuals. The current contribution outlines a voice activity detection (VAD) system specifically adapted to autistic children's vocalisations. The presented VAD system is a recurrent neural network (RNN) with long short-term memory (LSTM) cells. It is trained on 130 acoustic Low-Level Descriptors (LLDs) extracted from more than 17 h of audio recordings, which were richly annotated by experts in terms of perceived emotion as well as occurrence and type of vocalisations. The data consist of 25 English-speaking autistic children undertaking a structured, partly robot-assisted emotion-training activity and was collected as part of the DE-ENIGMA project. The VAD system is further utilised as a preprocessing step for a continuous speech emotion recognition (SER) task aiming to minimise the effects of potential confounding information, such as noise, silence, or non-child vocalisation. Its impact on the SER performance is compared to the impact of other VAD systems, including a general VAD system trained from the same data set, an out-of-the-box Web Real-Time Communication (WebRTC) VAD system, as well as the expert annotations. Our experiments show that the child VAD system achieves a lower performance than our general VAD system, trained under identical conditions, as we obtain receiver operating characteristic area under the curve (ROC-AUC) metrics of 0.662 and 0.850, respectively. The SER results show varying performances across valence and arousal depending on the utilised VAD system with a maximum concordance correlation coefficient (CCC) of 0.263 and a minimum root mean square error (RMSE) of 0.107. Although the performance of the SER models is generally low, the child VAD system can lead to slightly improved results compared to other VAD systems and in particular the VAD-less baseline, supporting the hypothesised importance of child VAD systems in the discussed context.
Autism spectrum conditions (ASC) are a set of neurodevelopmental conditions partly characterised by difficulties with communication. Individuals with ASC can show a variety of atypical speech behaviours, including echolalia or the 'echoing' of another's speech. We herein introduce a new dataset of 15 Serbian ASC children in a human-robot interaction scenario, annotated for the presence of echolalia amongst other ASC vocal behaviours. From this, we propose a four-class classification problem and investigate the suitability of applying a 2D convolutional neural network augmented with a recurrent neural network with bidirectional long short-term memory cells to solve the proposed task of echolalia recognition. In this approach, log Mel-spectrograms are first generated from the audio recordings and then fed as input into the convolutional layers to extract high-level spectral features. The subsequent recurrent layers are applied to learn the long-term temporal context from the obtained features. Finally, we use a feed forward neural network with softmax activation to classify the dataset. To evaluate the performance of our deep learning approach, we use leave-onesubject-out cross-validation. Key results presented indicate the suitability of our approach by achieving a classification accuracy of 83.5 % unweighted average recall.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.