Deep neural network architectures for dysarthric speech analysis and recognition

Zaidi, Brahim-Fares; Selouani, Sid‐Ahmed; Boudraa, Malika; Yakoub, Mohammed Sidi

doi:10.1007/s00521-020-05672-2

Cited by 21 publications

(11 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The latest attempt at the time of this writing is [24], where a comparison between MFCCs, mel-frequency spectral coefficients, and perceptual linear prediction features extraction approaches was made to develop a dysarthric phoneme recognition system. Then, another comparison was made between CNN and Long-Short-Term Memory neural architectures and benchmarked with the conventional GMM-HMM-based approaches.…”

Section: Related Workmentioning

confidence: 99%

“…On the other hand, our proposed solution is not affected by these limitations, as explained in the next section. It is pertinent to note that [24] was excluded from our comparative study since a compete ASR was not proposed, and the phoneme accuracy measured was not comparable to WER or WRAtwo criteria usually used to evaluate ASR efficacy.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System

Shahamiri

2021

IEEE Trans. Neural Syst. Rehabil. Eng.

View full text Add to dashboard Cite

Dysarthria is a disorder that affects an individual's speech intelligibility due to the paralysis of muscles and organs involved in the articulation process. As the condition is often associated with physically debilitating disabilities, not only do such individuals face communication problems, but also interactions with digital devices can become a burden. For these individuals, automatic speech recognition (ASR) technologies can make a significant difference in their lives as computing and portable digital devices can become an interaction medium, enabling them to communicate with others and computers. However, ASR technologies have performed poorly in recognizing dysarthric speech, especially for severe dysarthria, due to multiple challenges facing dysarthric ASR systems. We identified these challenges are due to the alternation and inaccuracy of dysarthric phonemes, the scarcity of dysarthric speech data, and the phoneme labeling imprecision. This paper reports on our second dysarthric-specific ASR system, called Speech Vision (SV) that tackles these challenges by adopting a novel approach towards dysarthric ASR in which speech features are extracted visually, then SV learns to see the shape of the words pronounced by dysarthric individuals. This visual acoustic modeling feature of SV eliminates phonemerelated challenges. To address the data scarcity problem, SV adopts visual data augmentation techniques, generates synthetic dysarthric acoustic visuals, and leverages transfer learning. Benchmarking with other state-of-the-art dysarthric ASR considered in this study, SV outperformed them by improving recognition accuracies for 67% of UA-Speech speakers, where the biggest improvements were achieved for severe dysarthria.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System

Shahamiri

2021

IEEE Trans. Neural Syst. Rehabil. Eng.

View full text Add to dashboard Cite

show abstract

“…If the objects are rotated by an angle of α, then its corresponding polar coordinate changes to (τr, θ + α). After log-polar transformation, the mapping is as Equations ( 8)- (10).…”

Section: Log-polar Transformationmentioning

confidence: 99%

“…In recent years, deep learning has played an important role in many areas of life, such as image processing [1][2][3], object detection [4][5][6], optic imaging [7][8][9], and speech recognition [10,11]. Especially in object detection and recognition, the accuracy of deep learning models becomes increasingly important [12].…”

Section: Introductionmentioning

confidence: 99%

LPNet: Retina Inspired Neural Network for Object Detection and Recognition

et al. 2021

View full text Add to dashboard Cite

The detection of rotated objects is a meaningful and challenging research work. Although the state-of-the-art deep learning models have feature invariance, especially convolutional neural networks (CNNs), their architectures did not specifically design for rotation invariance. They only slightly compensate for this feature through pooling layers. In this study, we propose a novel network, named LPNet, to solve the problem of object rotation. LPNet improves the detection accuracy by combining retina-like log-polar transformation. Furthermore, LPNet is a plug-and-play architecture for object detection and recognition. It consists of two parts, which we name as encoder and decoder. An encoder extracts images which feature in log-polar coordinates while a decoder eliminates image noise in cartesian coordinates. Moreover, according to the movement of center points, LPNet has stable and sliding modes. LPNet takes the single-shot multibox detector (SSD) network as the baseline network and the visual geometry group (VGG16) as the feature extraction backbone network. The experiment results show that, compared with conventional SSD networks, the mean average precision (mAP) of LPNet increased by 3.4% for regular objects and by 17.6% for rotated objects.

show abstract

“…In 2021, Brahim et al [17] found that the CNN-based system using perceptual linear prediction features achieved an impressive 82% recognition rate, which represents an improvement of 11% and 32% over the LSTM-and GMM-HMM-based systems, respectively, compared to the widely used MFCC.…”

mentioning

confidence: 99%

Re-Talk: Automated Speech Assistance for People with Dysarthria

Ali

Hassan²,

Salah³

et al. 2023

النشرة المعلوماتیة فی الحاسبات والمعلومات

View full text Add to dashboard Cite

Dysarthria is a speech motor disorder where the muscles responsible for speech production, such as in the face, mouth, or respiratory system, have trouble coordinating and controlling themselves. Our research goal is to help individuals with dysarthria communicate effectively. Often, physical conditions make it challenging for them to express their thoughts through writing. Our research introduces an automatic speech assistant solution, consisting of two main parts: speech recognition and auto-correct. The speech recognition component takes the person's distorted speech as input, converts it to text, and then sends it to the auto-correct module to fix any mistakes or unclear words. We tested our model on both English and Arabic datasets. The English dataset showed a 50% Word Error Rate (WER) which was reduced to 40% after using the auto-correct module. Our results outperformed previous studies by 4.5%. However, the WER on the Arabic dataset was 80% which is not a satisfactory result, due to the limited size of the Egyptian Dialect Dysarthric Speech (EDDS) database.

show abstract

Deep neural network architectures for dysarthric speech analysis and recognition

Cited by 21 publications

References 35 publications

Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System

Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System

LPNet: Retina Inspired Neural Network for Object Detection and Recognition

Re-Talk: Automated Speech Assistance for People with Dysarthria

Contact Info

Product

Resources

About