Many children with speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game, which is controlled by the children's voices in real time, with the purpose of helping children on practicing the production of European Portuguese (EP) sibilant consonants. For this, the game uses a sibilant consonant classifier. Since the game does not require any type of adult supervision, children can practice producing these sounds more often, which may lead to faster improvements of their speech. Recently, the use of deep neural networks has given considerable improvements in the classification of a variety of use cases, from image classification to speech and language processing. Here, we propose to use deep convolutional neural networks to classify sibilant phonemes of EP in our serious game for speech and language therapy. We compared the performance of several different artificial neural networks that used Mel frequency cepstral coefficients or log Mel filterbanks. Our best deep learning model achieves classification scores of 95.48% using a 2D convolutional model with log Mel filterbanks as input features. Such results are then further improved for specific classes with simple binary classifiers.
Children with fricative distortion errors have to learn how to correctly use the vocal folds, and which place of articulation to use in order to correctly produce the different fricatives. Here we propose a virtual tutor for fricatives distortion correction. This is a virtual tutor for speech and language therapy that helps children understand their fricative production errors and how to correctly use their speech organs. The virtual tutor uses log Mel filter banks and deep learning techniques with spectral-temporal convolutions of the data to classify the fricatives in children's speech by place of articulation and voicing. It achieves an accuracy of 90.40% for place of articulation and 90.93% for voicing with children's speech. Furthermore, this paper discusses a multidimensional advanced data analysis of the first layer convolutional kernel filters that validates the usefulness of performing the convolution on the log Mel filter bank.
The distortion of sibilant sounds is a common type of speech sound disorder in European Portuguese speaking children. Speech and language pathologists (SLP) use different types of speech production tasks to assess these distortions. One of these tasks consists of the sustained production of isolated sibilants. Using these sound productions, SLPs usually rely on auditory perceptual evaluation to assess the sibilant distortions. Here we propose to use an isolated sibilant machine learning model to help SLPs assessing these distortions.Our model uses Mel frequency cepstral coefficients of the isolated sibilant phones and it was trained with data from 145 children. The analysis of the false negatives detected by the model can give insight into whether the child has a sibilant production distortion. We were able to confirm that there exist some relation between the model classification results and the distortion assessment of professional SLPs. Approximately 66% of the distortion cases identified by the model are confirmed by an SLP as having some sort of distortion or are perceived as being the production of a different sound.
Many children suffering from speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game that is controlled by the children's voices in real time and that allows children to practice the European Portuguese sibilant consonants. For this, the game uses a sibilant consonant classifier. Since the game does not require any type of adult supervision, children can practice the production of these sounds more often, which may lead to faster improvements of their speech. Recently, the use of deep neural networks has given considerable improvements in classification for a variety of use cases, from image classification to speech and language processing. Here we propose to use deep convolutional neural networks to classify sibilant phonemes of European Portuguese in our serious game for speech and language therapy. We compared the performance of several different artificial neural networks that used Mel frequency cepstral coefficients or log Mel filterbanks. Our best deep learning model achieves classification scores of 95.48% using a 2D convolutional model with log Mel filterbanks as input features.
The SeniorTec program was carried out in the framework of the European project Cordon Gris, its objective was to promote intergenerational relationships. The activity was hosted by the old persons who participate in the European project Cordon Gris and was attended by university students from Higher School of Health of Alcoitão and ISCTE-University Institute of Lisbon. Learning sessions were delivered in this action: a Nutrition Workshop and a Financial Education Workshop. After this, an intergenerational contact session took place by using the Cordon Gris app. The SeniorTec program could promote a more positive image of aging and deconstruct negative stereotypes. Valuing older people in technology can act as a powerful vehicle to improve images of aging and to encourage intergenerational contact
In order to develop computer tools for speech therapy that reliably classify speech productions, there is a need for speech production corpora that characterize the target population in terms of age, gender, and native language. Apart from including correct speech productions, in order to characterize the target population, the corpora should also include samples from people with speech sound disorders. In addition, the annotation of the data should include information on the correctness of the speech productions. Following these criteria, we collected a corpus that can be used to develop computer tools for speech and language therapy of Portuguese children with sigmatism. The proposed corpus contains European Portuguese children’s word productions in which the words have sibilant consonants. The corpus has productions from 356 children from 5 to 9 years of age. Some important characteristics of this corpus, that are relevant to speech and language therapy and computer science research, are that (1) the corpus includes data from children with speech sound disorders; and (2) the productions were annotated according to the criteria of speech and language pathologists, and have information about the speech production errors. These are relevant features for the developmentand assessment of speech processing toolsfor speech therapy of Portuguese children. In addition, as an illustration on how to use the corpus, we present three speech therapy games that use a convolutional neural network sibilants classifier trained with data from this corpus and a word recognition module trained on additional children data and calibrated and evaluated with the collected corpus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.