2017 International Conference and Workshop on Bioinspired Intelligence (IWOBI) 2017
DOI: 10.1109/iwobi.2017.7985525
|View full text |Cite
|
Sign up to set email alerts
|

Voice Pathology Detection Using Deep Learning: a Preliminary Study

Abstract: This paper describes a preliminary investigation of Voice Pathology Detection using Deep Neural Networks (DNN). We used voice recordings of sustained vowel /a/ produced at normal pitch from German corpus Saarbruecken Voice Database (SVD). This corpus contains voice recordings and electroglottograph signals of more than 2 000 speakers. The idea behind this experiment is the use of convolutional layers in combination with recurrent Long-Short-Term-Memory (LSTM) layers on raw audio signal. Each recording was spli… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
35
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 73 publications
(36 citation statements)
references
References 17 publications
(21 reference statements)
0
35
0
1
Order By: Relevance
“…The advantage of this task in comparison with other commonly used vocal tasks is its independence of articulatory and other linguistic confounds [38]. Moreover, it is also present in most of the databases and therefore the experiments proposed in our work are comparable with other commonly used databases [39,40].…”
Section: Vocal Tasksmentioning
confidence: 65%
“…The advantage of this task in comparison with other commonly used vocal tasks is its independence of articulatory and other linguistic confounds [38]. Moreover, it is also present in most of the databases and therefore the experiments proposed in our work are comparable with other commonly used databases [39,40].…”
Section: Vocal Tasksmentioning
confidence: 65%
“…For the above mentioned experiments, we decided to analyze the performance of the voice pathology detection models using multiple types of input data: a) raw audio samples to follow our previous work [22] and fur- ther explore possibilities of robust voice pathology detection without manually-selected features (DenseNet), b) conventional acoustic (dysphonic) features to follow the previously published works and quantify most common vocal pathologies (XGBoost, Isolation Forest), c) spectrograms to achieve a reasonable trade-off between dimensionality of the data and amount of information (DenseNet), and d) MFCC to follow the previous works focusing on voice and speech modelling, and voice pathology detection (all models).…”
Section: Methodsmentioning
confidence: 99%
“…livered state-of-the-art results in many domains including speech processing. To our best knowledge, despite our previous work [22], there are no other papers using deep learning algorithms for voice pathology detection. Next, we also employ the conventional voice pathology detection approach based on acoustic feature extraction procedure.…”
Section: Introductionmentioning
confidence: 97%
“…Precision (P) shows how many of the pathological voice files classified are relevant, and F1-score (F1) has also been taken into account, calculated as in  It can be seen from Table V that the classifier achieved overall accuracy (ACC) of 88%, 66% and 77% on training dataset, validation dataset and testing dataset respectively. Compared to [11], spectrogram features show greater performance on pathological voice detection than raw timedomain signals. Moreover, the proposed algorithm is shown to be more robust for dealing with large amount of data compared to [6,8,9].…”
Section: Resultsmentioning
confidence: 99%
“…This is questionable compared to [10] using GMM-HMM which achieves 67.00% accuracy when the data amount is large. In [11], Deep Learning has been used for the first time, applying Long Short-Term Memory (LSTM), a type of recurrent neural network and using information from the timedomain axis. However, since pathological voice contains information without regard to time, this model might not be the most proper one for this problem.…”
Section: Convolutional Neural Network For Pathological Voice Detectionmentioning
confidence: 99%