Voice Pathology Detection Using Deep Learning: a Preliminary Study

Harár, Pavol; Alonso-Hernandezy, Jesus B.; Mekyska, Jiří; Galáž, Zoltán; Bürget, Radim; Smékal, Zdeněk

doi:10.1109/iwobi.2017.7985525

Cited by 73 publications

(36 citation statements)

References 17 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The advantage of this task in comparison with other commonly used vocal tasks is its independence of articulatory and other linguistic confounds [38]. Moreover, it is also present in most of the databases and therefore the experiments proposed in our work are comparable with other commonly used databases [39,40].…”

Section: Vocal Tasksmentioning

confidence: 65%

Changes in Phonation and Their Relations with Progress of Parkinson’s Disease

et al. 2018

Self Cite

View full text Add to dashboard Cite

Hypokinetic dysarthria, which is associated with Parkinson's disease (PD), affects several speech dimensions, including phonation. Although the scientific community has dealt with a quantitative analysis of phonation in PD patients, a complex research revealing probable relations between phonatory features and progress of PD is missing. Therefore, the aim of this study is to explore these relations and model them mathematically to be able to estimate progress of PD during a two-year follow-up. We enrolled 51 PD patients who were assessed by three commonly used clinical scales. In addition, we quantified eight possible phonatory disorders in five vowels. To identify the relationship between baseline phonatory features and changes in clinical scores, we performed a partial correlation analysis. Finally, we trained XGBoost models to predict the changes in clinical scores during a two-year follow-up. For two years, the patients' voices became more aperiodic with increased microperturbations of frequency and amplitude. Next, the XGBoost models were able to predict changes in clinical scores with an error in range 11-26%. Although we identified some significant correlations between changes in phonatory features and clinical scores, they are less interpretable. This study suggests that it is possible to predict the progress of PD based on the acoustic analysis of phonation. Moreover, it recommends utilizing the sustained vowel /i/ instead of /a/.

show abstract

Section: Vocal Tasksmentioning

confidence: 65%

Changes in Phonation and Their Relations with Progress of Parkinson’s Disease

et al. 2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…For the above mentioned experiments, we decided to analyze the performance of the voice pathology detection models using multiple types of input data: a) raw audio samples to follow our previous work [22] and fur- ther explore possibilities of robust voice pathology detection without manually-selected features (DenseNet), b) conventional acoustic (dysphonic) features to follow the previously published works and quantify most common vocal pathologies (XGBoost, Isolation Forest), c) spectrograms to achieve a reasonable trade-off between dimensionality of the data and amount of information (DenseNet), and d) MFCC to follow the previous works focusing on voice and speech modelling, and voice pathology detection (all models).…”

Section: Methodsmentioning

confidence: 99%

“…livered state-of-the-art results in many domains including speech processing. To our best knowledge, despite our previous work [22], there are no other papers using deep learning algorithms for voice pathology detection. Next, we also employ the conventional voice pathology detection approach based on acoustic feature extraction procedure.…”

Section: Introductionmentioning

confidence: 97%

Towards robust voice pathology detection

Harár

Galáž

Alonso-Hernández

et al. 2018

Neural Comput & Applic

Self Cite

View full text Add to dashboard Cite

Automatic objective non-invasive detection of pathological voice based on computerized analysis of acoustic signals can play an important role in early diagnosis, progression tracking and even effective treatment of pathological voices. In search towards such a robust voice pathology detection system we investigated 3 distinct classifiers within supervised learning and anomaly detection paradigms. We conducted a set of experiments using a variety of input data such as raw waveforms, spectrograms, mel-frequency cepstral coefficients (MFCC) and conventional acoustic (dysphonic) features (AF). In comparison with previously published works, this article is the first to utilize combination of 4 different databases comprising normophonic and pathological recordings of sustained phonation of the vowel /a/ unrestricted to a subset of vocal pathologies. Furthermore, to our best knowledge, this article is the first to explore gradient boosted trees and deep learning for this application. The following best classification performances measured by F1 score on dedicated test set were achieved: XGBoost (0.733) using AF and MFCC, DenseNet (0.621) using MFCC, and Isolation Forest (0.610) using AF. Even though these results are of exploratory character, conducted experiments do show promising potential of gradient boosting and deep learning methods to robustly detect voice pathologies.

show abstract

“…Precision (P) shows how many of the pathological voice files classified are relevant, and F1-score (F1) has also been taken into account, calculated as in  It can be seen from Table V that the classifier achieved overall accuracy (ACC) of 88%, 66% and 77% on training dataset, validation dataset and testing dataset respectively. Compared to [11], spectrogram features show greater performance on pathological voice detection than raw timedomain signals. Moreover, the proposed algorithm is shown to be more robust for dealing with large amount of data compared to [6,8,9].…”

Section: Resultsmentioning

confidence: 99%

“…This is questionable compared to [10] using GMM-HMM which achieves 67.00% accuracy when the data amount is large. In [11], Deep Learning has been used for the first time, applying Long Short-Term Memory (LSTM), a type of recurrent neural network and using information from the timedomain axis. However, since pathological voice contains information without regard to time, this model might not be the most proper one for this problem.…”

Section: Convolutional Neural Network For Pathological Voice Detectionmentioning

confidence: 99%

Convolutional Neural Networks for Pathological Voice Detection

Soraghan

Lowit

et al. 2018

2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

View full text Add to dashboard Cite

Acoustic analysis using signal processing tools can be used to extract voice features to distinguish whether a voice is pathological or healthy. The proposed work uses spectrogram of voice recordings from a voice database as the input to a Convolutional Neural Network (CNN) for automatic feature extraction and classification of disordered and normal voice. The novel classifier achieved 88.5%, 66.2% and 77.0% accuracy on training, validation and testing data set respectively on 482 normal and 482 organic dysphonia speech files. It reveals that the proposed novel algorithm on the Saarbruecken Voice Database can effectively been used for screening pathological voice recordings.

show abstract

Voice Pathology Detection Using Deep Learning: a Preliminary Study

Cited by 73 publications

References 17 publications

Changes in Phonation and Their Relations with Progress of Parkinson’s Disease

Changes in Phonation and Their Relations with Progress of Parkinson’s Disease

Towards robust voice pathology detection

Convolutional Neural Networks for Pathological Voice Detection

Contact Info

Product

Resources

About