Comparative Analysis of CNN and RNN for Voice Pathology Detection

Syed, Sidra Abid; Rashid, Munaf; Hussain, Samreen; Zahid, Hina

doi:10.1155/2021/6635964

Cited by 40 publications

(19 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This data is a collection of vowels /a/, /i/, and /u/ and “Good Morning, how are you?” sentences, recorded with normal, low, high, rising, and falling pitch, available in both English and German languages. However, utilizing the /a/ vocalization subset of SVD remarks good classification results and is used in the literature [ 35 , 55 ]. For our analysis, we have used the /a/ vowel phonation with a normal pitch in the English language.…”

Section: Methodsmentioning

confidence: 99%

Neurogenerative Disease Diagnosis in Cepstral Domain Using MFCC with Deep Learning

Alghamdi

Zakariah

Hoang

et al. 2022

Computational and Mathematical Methods in Medicine

View full text Add to dashboard Cite

Because underlying cognitive and neuromuscular activities regulate speech signals, biomarkers in the human voice can provide insight into neurological illnesses. Multiple motor and nonmotor aspects of neurologic voice disorders arise from an underlying neurologic condition such as Parkinson’s disease, multiple sclerosis, myasthenia gravis, or ALS. Voice problems can be caused by disorders that affect the corticospinal system, cerebellum, basal ganglia, and upper or lower motoneurons. According to a new study, voice pathology detection technologies can successfully aid in the assessment of voice irregularities and enable the early diagnosis of voice pathology. In this paper, we offer two deep-learning-based computational models, 1-dimensional convolutional neural network (1D CNN) and 2-dimensional convolutional neural network (2D CNN), that simultaneously detect voice pathologies caused by neurological illnesses or other causes. From the German corpus Saarbruecken Voice Database (SVD), we used voice recordings of sustained vowel /a/ generated at normal pitch. The collected voice signals are padded and segmented to maintain homogeneity and increase the number of samples. Convolutional layers are applied to raw data, and MFCC features are extracted in this project. Although the 1D CNN had the maximum accuracy of 93.11% on test data, model training produced overfitting and 2D CNN, which generalized the data better and had lower train and validation loss despite having an accuracy of 84.17% on test data. Also, 2D CNN outperforms state-of-the-art studies in the field, implying that a model trained on handcrafted features is better for speech processing than a model that extracts features directly.

show abstract

Section: Methodsmentioning

confidence: 99%

Neurogenerative Disease Diagnosis in Cepstral Domain Using MFCC with Deep Learning

Alghamdi

Zakariah

Hoang

et al. 2022

Computational and Mathematical Methods in Medicine

View full text Add to dashboard Cite

show abstract

“…Clinicians have been using sounds and acoustic data such as acoustic data to diagnose various conditions: voice pathologies, dry and wet cough, sleep disorders, and more [28][29][30][31][32][33][34] . Recently, several works also exploited sound data for large-scale COVID screening.…”

Section: Related Workmentioning

confidence: 99%

Reliability of crowdsourced data and patient-reported outcome measures in cough-based COVID-19 screening

Xiong

Berkovsky

Kâafar

et al. 2022

Sci Rep

View full text Add to dashboard Cite

Mass community testing is a critical means for monitoring the spread of the COVID-19 pandemic. Polymerase chain reaction (PCR) is the gold standard for detecting the causative coronavirus 2 (SARS-CoV-2) but the test is invasive, test centers may not be readily available, and the wait for laboratory results can take several days. Various machine learning based alternatives to PCR screening for SARS-CoV-2 have been proposed, including cough sound analysis. Cough classification models appear to be a robust means to predict infective status, but collecting reliable PCR confirmed data for their development is challenging and recent work using unverified crowdsourced data is seen as a viable alternative. In this study, we report experiments that assess cough classification models trained (i) using data from PCR-confirmed COVID subjects and (ii) using data of individuals self-reporting their infective status. We compare performance using PCR-confirmed data. Models trained on PCR-confirmed data perform better than those trained on patient-reported data. Models using PCR-confirmed data also exploit more stable predictive features and converge faster. Crowd-sourced cough data is less reliable than PCR-confirmed data for developing predictive models for COVID-19, and raises concerns about the utility of patient reported outcome data in developing other clinical predictive models when better gold-standard data are available.

show abstract

“…We determined that the CNN approach was more appropriate for the current dataset as we had a limited number of samples from which we aimed to test our model. Previous work by You, Liu and Chen [32] observed that more complicated neural networks (i.e., RNNs and/or hybrid models) may result in lower accuracy and/or fail to converge when trying to model "relatively" small datasets and Zhang et al [29] observed that RNNs are more computationally expensive compared to CNNs, with potentially little increases in accuracy [33]. Therefore we implemented a convolutional neural network (CNN) model for voice samples machine learning training, which relied on image feature transformation (see [34] for a similar example).…”

Section: Feature Extractionmentioning

confidence: 99%

On a Vector towards a Novel Hearing Aid Feature: What Can We Learn from Modern Family, Voice Classification and Deep Learning Algorithms

et al. 2021

View full text Add to dashboard Cite

(1) Background: The application of machine learning techniques in the speech recognition literature has become a large field of study. Here, we aim to (1) expand the available evidence for the use of machine learning techniques for voice classification and (2) discuss the implications of such approaches towards the development of novel hearing aid features (i.e., voice familiarity detection). To do this, we built and tested a Convolutional Neural Network (CNN) Model for the identification and classification of a series of voices, namely the 10 cast members of the popular television show “Modern Family”. (2) Methods: Representative voice samples were selected from Season 1 of Modern Family (N = 300; 30 samples for each of the classes of the classification in this model, namely Phil, Claire, Hailey, Alex, Luke, Gloria, Jay, Manny, Mitch, Cameron). The audio samples were then cleaned and normalized. Feature extraction was then implemented and used as the input to train a basic CNN model and an advanced CNN model. (3) Results: Accuracy of voice classification for the basic model was 89%. Accuracy of the voice classification for the advanced model was 99%.; (4) Conclusions: Greater familiarity with a voice is known to be beneficial for speech recognition. If a hearing aid can eventually be programmed to recognize voices that are familiar or not, perhaps it can also apply familiar voice features to improve hearing performance. Here we discuss how such machine learning, when applied to voice recognition, is a potential technological solution in the coming years.

show abstract

Comparative Analysis of CNN and RNN for Voice Pathology Detection

Cited by 40 publications

References 21 publications

Neurogenerative Disease Diagnosis in Cepstral Domain Using MFCC with Deep Learning

Neurogenerative Disease Diagnosis in Cepstral Domain Using MFCC with Deep Learning

Reliability of crowdsourced data and patient-reported outcome measures in cough-based COVID-19 screening

On a Vector towards a Novel Hearing Aid Feature: What Can We Learn from Modern Family, Voice Classification and Deep Learning Algorithms

Contact Info

Product

Resources

About