Transfer Learning with AudioSet to Voice Pathologies Identification in Continuous Speech

Guedes, Victor; Teixeira, Felipe; Oliveira, Alessa Anjos de; Fernandes, Joana; Silva, Letícia; Cândido, Arnaldo; Teixeira, João Paulo

doi:10.1016/j.procs.2019.12.233

Cited by 28 publications

(20 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Also, there is not much work done for voice pathology using a convolutional neural network. Only Guedes et al [18] designed a system and reported an accuracy of 80%, and Zhang et al [19] also use the DNN model which was machine learning where outcomes were missing. So after a detailed literature review, it was concluded that a novel system can be proposed using pitch, 13 MFCC, rolloff, ZCR, energy entropy, spectral flux, spectral centroid, and energy as features and RNN as a classifier ).…”

Section: Related Workmentioning

confidence: 99%

Comparative Analysis of CNN and RNN for Voice Pathology Detection

Syed

Rashid

Hussain

et al. 2021

BioMed Research International

View full text Add to dashboard Cite

Diagnosis on the basis of a computerized acoustic examination may play an incredibly important role in early diagnosis and in monitoring and even improving effective pathological speech diagnostics. Various acoustic metrics test the health of the voice. The precision of these parameters also has to do with algorithms for the detection of speech noise. The idea is to detect the disease pathology from the voice. First, we apply the feature extraction on the SVD dataset. After the feature extraction, the system input goes into the 27 neuronal layer neural networks that are convolutional and recurrent neural network. We divided the dataset into training and testing, and after 10 k-fold validation, the reported accuracies of CNN and RNN are 87.11% and 86.52%, respectively. A 10-fold cross-validation is used to evaluate the performance of the classifier. On a Linux workstation with one NVidia Titan X GPU, program code was written in Python using the TensorFlow package.

show abstract

Section: Related Workmentioning

confidence: 99%

Comparative Analysis of CNN and RNN for Voice Pathology Detection

Syed

Rashid

Hussain

et al. 2021

BioMed Research International

View full text Add to dashboard Cite

show abstract

“…Instead, this paper focuses on two publicly available databases: the Saarbruechen Voice Database (SVD) [12][13][14][15][16] and Voice ICar fEDerico II (VOICED) [16][17][18]. The following is a summary of existing approaches applied to the SVD.…”

Section: Literature Reviewmentioning

confidence: 99%

“…The SMO based SVM yielded the best performance in accuracy (0.858), sensitivity (0.876), and specificity (0.839). Guedes et al [14] proposed two approaches, long short-term memory (LSTM) and convolutional neural network (CNN), for differentiation between healthy and dysphonic candidates, healthy and laryngitic candidates, and healthy and paralyzed candidates. The achieved precision values were 0.66, 0.67, and 0.78, respectively.…”

Section: Literature Reviewmentioning

confidence: 99%

“…Existing works have applied various algorithms for voice disorder detection [12][13][14][15][16][17][18]. However, further research and exploration are necessary to address the following limitations in existing works.…”

Section: Research Gaps and Motivationmentioning

confidence: 99%

“…There is room for improvement in the accuracy, sensitivity, and specificity of voice disorder detection models [13][14][15][16]18]. Particularly in smart healthcare applications, the performance of the machine learning model has high expectations, as such applications are related to the health status of humans.…”

Section: Imentioning

confidence: 99%

See 2 more Smart Citations

Combined Generative Adversarial Network and Fuzzy C-Means Clustering for Multi-Class Voice Disorder Detection with an Imbalanced Dataset

2020

View full text Add to dashboard Cite

The world has witnessed the success of artificial intelligence deployment for smart healthcare applications. Various studies have suggested that the prevalence of voice disorders in the general population is greater than 10%. An automatic diagnosis for voice disorders via machine learning algorithms is desired to reduce the cost and time needed for examination by doctors and speech-language pathologists. In this paper, a conditional generative adversarial network (CGAN) and improved fuzzy c-means clustering (IFCM) algorithm called CGAN-IFCM is proposed for the multi-class voice disorder detection of three common types of voice disorders. Existing benchmark datasets for voice disorders, the Saarbruecken Voice Database (SVD) and the Voice ICar fEDerico II Database (VOICED), use imbalanced classes. A generative adversarial network offers synthetic data to reduce bias in the detection model. Improved fuzzy c-means clustering considers the relationship between adjacent data points in the fuzzy membership function. To explain the necessity of CGAN and IFCM, a comparison is made between the algorithm with CGAN and that without CGAN. Moreover, the performance is compared between IFCM and traditional fuzzy c-means clustering. Lastly, the proposed CGAN-IFCM outperforms existing models in its true negative rate and true positive rate by 9.9–12.9% and 9.1–44.8%, respectively.

show abstract