2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2018
DOI: 10.1109/embc.2018.8513222
|View full text |Cite
|
Sign up to set email alerts
|

Convolutional Neural Networks for Pathological Voice Detection

Abstract: Acoustic analysis using signal processing tools can be used to extract voice features to distinguish whether a voice is pathological or healthy. The proposed work uses spectrogram of voice recordings from a voice database as the input to a Convolutional Neural Network (CNN) for automatic feature extraction and classification of disordered and normal voice. The novel classifier achieved 88.5%, 66.2% and 77.0% accuracy on training, validation and testing data set respectively on 482 normal and 482 organic dyspho… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 36 publications
(22 citation statements)
references
References 14 publications
0
21
0
1
Order By: Relevance
“…However, other studies have used continuous speech samples for analyses [6]. CNNs extract features automatically from the spectrogram of voice recordings for dysphonia diagnosis, and a larger amount of training data yields better results [32]. Therefore, the CNN used here may have extracted more features from these entire voice samples, thereby achieving better training results with our model.…”
Section: Comparison With Prior Workmentioning
confidence: 99%
“…However, other studies have used continuous speech samples for analyses [6]. CNNs extract features automatically from the spectrogram of voice recordings for dysphonia diagnosis, and a larger amount of training data yields better results [32]. Therefore, the CNN used here may have extracted more features from these entire voice samples, thereby achieving better training results with our model.…”
Section: Comparison With Prior Workmentioning
confidence: 99%
“…This work also uses the SVD, which was recorded by the Institute of Phonetics of the Saarland University in Germany [11]. We use the sustained vowel /a/ sound recorded from each individual at neutral pitch, of which 482 are healthy and 482 are diagnosed with various pathologies (140 laryngitis, 41 leukoplakia, 68 Reinke's edema, 213 recurrent laryngeal nerve paralyses, 22 vocal fold carcinoma, and 45 vocal fold polyps) [6], [8]. The extracted subset is the same as that described in previous studies [6], [8] to enable the comparison of our results with those of the aforementioned previous studies.…”
Section: Databasementioning
confidence: 99%
“…The deep neural network (DNN) algorithm could fully utilize the acoustic features and efficiently differentiate between normal and pathological voice samples [5]. A convolutional neural network (CNN) and short-time Fourier transform (STFT) were used for the reliable classification and feature detection of voice pathologies [6]. Mel-frequency cepstral coefficients (MFCC) derived from the Saarbruecken voice database (SVD) are analyzed using an artificial neural network (ANN) and a support vector machine (SVM).…”
Section: Introductionmentioning
confidence: 99%
“…Анализ речи используется в логопедии для обнаружения и диагностики речевых нарушений [15,16]. А в перспективе можно будет проводить количественную оценку улучшения состояния речевых функций пациента в процессе и после лечения (в противовес субъективной оценке динамики).…”
Section: анализ речиunclassified