“…This again justifies the performance of LSTM in UA-Speech data classification, being worse than the rest. The temporal information identified by the LSTM model from the common words is not sufficient enough to identify the severity level from the uncommon words [25]. This is checked by using a mixed-up data for training and testing, and an accuracy of 88.59% is obtained, which validates the inference.…”
Section: Results and Discussion A Analysing Mfccs And Cqccs (E1)mentioning
confidence: 74%
“…As the model grows in depth with increasing n, the upper layers find efficient feature representations that generalise well across the datasets. Thus, an increase in accuracy was observed up to n = 4 for both the databases on using MFCCs with DNNs [25]. While labelling the graphs, UA-Speech is referred to as UAS.…”
Section: Results and Discussion A Analysing Mfccs And Cqccs (E1)mentioning
confidence: 89%
“…• Performance analysis of the basic deep learning architectures namely, DNN, CNN, gated recurrent units (GRU), and LSTM using MFCCs and CQCCs. Our initial phase of work using MFCCs is reported in [25]. • Assessment of prosodic, glottal, phonetic, and articulatory features on DNN classifiers.…”
Section: B Contributionmentioning
confidence: 99%
“…They are more closely related to the human perception system, by giving a higher frequency resolution at lower frequencies and higher temporal resolution at higher frequencies. With these understandings, we perform the first experiment (E1), where the basic deep learning strategies, namely DNN, CNN, GRU and LSTM are employed for classification, with MFCCs [25] and CQCCs as features.…”
Assessing the severity level of dysarthria can provide an insight into the patient's improvement, assist pathologists to plan therapy, and aid automatic dysarthric speech recognition systems. In this article, we present a comparative study on the classification of dysarthria severity levels using different deep learning techniques and acoustic features. First, we evaluate the basic architectural choices such as deep neural network (DNN), convolutional neural network, gated recurrent units and long short-term memory network using the basic speech features, namely, Mel-frequency cepstral coefficients (MFCCs) and constant-Q cepstral coefficients. Next, speech-disorder specific features computed from prosody, articulation, phonation and glottal functioning are evaluated on DNN models. Finally, we explore the utility of low-dimensional feature representation using subspace modeling to give i-vectors, which are then classified using DNN models. Evaluation is done using the standard UA-Speech and TORGO databases. By giving an accuracy of 93.97% under the speaker-dependent scenario and 49.22% under the speaker-independent scenario for the UA-Speech database, the DNN classifier using MFCC-based i-vectors outperforms other systems.
“…This again justifies the performance of LSTM in UA-Speech data classification, being worse than the rest. The temporal information identified by the LSTM model from the common words is not sufficient enough to identify the severity level from the uncommon words [25]. This is checked by using a mixed-up data for training and testing, and an accuracy of 88.59% is obtained, which validates the inference.…”
Section: Results and Discussion A Analysing Mfccs And Cqccs (E1)mentioning
confidence: 74%
“…As the model grows in depth with increasing n, the upper layers find efficient feature representations that generalise well across the datasets. Thus, an increase in accuracy was observed up to n = 4 for both the databases on using MFCCs with DNNs [25]. While labelling the graphs, UA-Speech is referred to as UAS.…”
Section: Results and Discussion A Analysing Mfccs And Cqccs (E1)mentioning
confidence: 89%
“…• Performance analysis of the basic deep learning architectures namely, DNN, CNN, gated recurrent units (GRU), and LSTM using MFCCs and CQCCs. Our initial phase of work using MFCCs is reported in [25]. • Assessment of prosodic, glottal, phonetic, and articulatory features on DNN classifiers.…”
Section: B Contributionmentioning
confidence: 99%
“…They are more closely related to the human perception system, by giving a higher frequency resolution at lower frequencies and higher temporal resolution at higher frequencies. With these understandings, we perform the first experiment (E1), where the basic deep learning strategies, namely DNN, CNN, GRU and LSTM are employed for classification, with MFCCs [25] and CQCCs as features.…”
Assessing the severity level of dysarthria can provide an insight into the patient's improvement, assist pathologists to plan therapy, and aid automatic dysarthric speech recognition systems. In this article, we present a comparative study on the classification of dysarthria severity levels using different deep learning techniques and acoustic features. First, we evaluate the basic architectural choices such as deep neural network (DNN), convolutional neural network, gated recurrent units and long short-term memory network using the basic speech features, namely, Mel-frequency cepstral coefficients (MFCCs) and constant-Q cepstral coefficients. Next, speech-disorder specific features computed from prosody, articulation, phonation and glottal functioning are evaluated on DNN models. Finally, we explore the utility of low-dimensional feature representation using subspace modeling to give i-vectors, which are then classified using DNN models. Evaluation is done using the standard UA-Speech and TORGO databases. By giving an accuracy of 93.97% under the speaker-dependent scenario and 49.22% under the speaker-independent scenario for the UA-Speech database, the DNN classifier using MFCC-based i-vectors outperforms other systems.
“…Nonetheless, deep learning algorithms have shown to deliver state-of-the-art performances when dealing with unstructured data such as speech in comparison to shallow algorithms. In terms of deep learning algorithms to perform reference-free intelligibility assessment, we can refer to [16], where multiple standard deep learning architectures were built and evaluated on UA-Speech and TORGO [17] corpora. In particular, a fully connected dense neural network, a convolutional neural network (CNN), and a long short-term memory network (LSTM) were considered in this study.…”
Recent advances in deep learning have provided an opportunity to improve and automate dysarthria intelligibility assessment, offering a cost-effective, accessible, and less subjective way to assess dysarthric speakers. However, reviewing previous literature in the area determines that the generalization of results on new dysarthric patients was not measured properly or incomplete among the previous studies that yielded very high accuracies due to the gaps in the adopted evaluation methodologies. This is of particular importance as any practical and clinical application of intelligibility assessment approaches must reliably generalize on new patients; otherwise, the clinicians cannot accept the assessment results provided by the system deploying the approach. In this paper, after these gaps are explained, we report on our extensive investigation to propose a deep learning–based dysarthric intelligibility assessment optimal setup. Then, we explain different evaluation strategies that were applied to thoroughly verify how the optimal setup performs with new speakers and across different classes of speech intelligibility. Finally, a comparative study was conducted, benchmarking the performance of our proposed optimal setup against the state of the art by adopting similar strategies previous studies employed. Results indicate an average of 78.2% classification accuracy for unforeseen low intelligibility speakers, 40.6% for moderate intelligibility speakers, and 40.4% for high intelligibility speakers. Furthermore, we noticed a high variance of classification accuracies among individual speakers. Finally, our proposed optimal setup delivered an average of 97.19% classification accuracy when adopting a similar evaluation strategy used by the previous studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.