In this paper, the proposed work tests the computer vision application to perform the skill and emotion assessment of children with Autism Spectrum Disorder (ASD) by extracting various bio-behaviors, human activities, child-therapist interactions, and joint pose estimations from the video-recorded interactive singleor two-person play-based intervention sessions. A comprehensive data set of 300 videos are amassed from ASD children engaged in social interaction and developed three novel deep learning-based computer vision models which are explained as follows: 1) activity comprehension to analyze child-play partner interactions (Activity Comprehension model); 2) an automatic joint attention recognition framework using pose, and 3) emotion and facial expression recognition. We tested models on children's real-world unseen 68 videos captured from the clinic and public datasets. The activity comprehension model has an overall accuracy of 72.32%, the joint attention models have an accuracy of 97% for following eye gaze and 93.4% for hand pointing and the facial expression recognition model has an overall accuracy of 95.1%. The proposed models could extract activities and behaviors of interest from free-play and intervention session videos, empowering clinicians with data useful in diagnosis, assessment, treatment formulation, and monitoring of ASD children with limited supervision.
Human action recognition (HAR) in untrimmed videos can make insightful predictions of human behaviour. Previous work on HAR‐included models trained on spatial and temporal annotations and could classify limited actions from trimmed videos. These methods reported limitations such as (1) performance degradation due to the lack of precision temporal regions proposal and (2) poor adaptability of the models in the clinical domain because of unrelated actions of interest. We propose an innovative method that could analyse untrimmed behavioural videos to recommend actions of interest leading to diagnostic and functional assessments for children with Autism Spectrum Disorder (ASD). Our method entails end‐to‐end behaviour action recognition (BAR) pipeline, including child detection, temporal action localization, and actions of interest identification and classification. The model trained on the data of 400 ASD children and 125 with other developmental delays (ODD) accurately identified ASD, ODD, and Neurotypical children with 79.7%, 77.2%, and 80.8% accuracy, respectively. The model's performance on an independent benchmark Self‐Stimulatory Behaviour Dataset (SSBD) reported top‐1 accuracy of 78.57% for combined localization with action recognition, significantly higher than the earlier reported outcomes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.