Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1351
|View full text |Cite
|
Sign up to set email alerts
|

A Deep Learning Method for Pathological Voice Detection Using Convolutional Deep Belief Networks

Abstract: Automatically detecting pathological voice disorders such as vocal cord paralysis or Reinke's edema is a challenging and important medical classification problem. While deep learning techniques have achieved significant progress in the speech recognition field there has been less research work in the area of pathological voice disorders detection. A novel system for pathological voice detection using convolutional neural network (CNN) as the basic architecture is presented in this work. The novel system uses s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
39
0
4

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 72 publications
(43 citation statements)
references
References 8 publications
0
39
0
4
Order By: Relevance
“…For every database, 70% of the data is used in training, 20% is used in testing and the remaining 10% of the speech data is used for validation. This type of data partition has been followed in several previous detection studies related both to traditional pipeline [76] and end-to-end [32], [33] systems. For UA-Speech and TORGO, the database is split in order to maintain a good partition of speakers with different severities or intelligibility scores between the training, validation, and test sets, without having any overlap in speakers between the different sets.…”
Section: B Experimental Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…For every database, 70% of the data is used in training, 20% is used in testing and the remaining 10% of the speech data is used for validation. This type of data partition has been followed in several previous detection studies related both to traditional pipeline [76] and end-to-end [32], [33] systems. For UA-Speech and TORGO, the database is split in order to maintain a good partition of speakers with different severities or intelligibility scores between the training, validation, and test sets, without having any overlap in speakers between the different sets.…”
Section: B Experimental Setupmentioning
confidence: 99%
“…In studying pathological voice detection with end-to-end systems, previous studies have used either raw time-domain speech signal or its spectrum to train deep learning models [31]- [35]. In order to develop deep learning models, existing studies have mainly used combinations of convolutional neural network (CNN) and multilayer perceptron (MLP) [31], [33]- [37]. In addition, some studies have explored combining CNN and long short-term memory (LSTM) networks [32], and combining LSTM and MLP [38] for detection of pathological voice from healthy speech.…”
Section: Introductionmentioning
confidence: 99%
“…Pathological voice disorder, due to vocal cord paralysis or Reinke's edema, is investigated in [112]. In the paper, FIGURE 16.…”
Section: E the Spectrogram Featuresmentioning
confidence: 99%
“…Noise reduction Background noise is reduced based on the spectral gating algorithm implemented in the SoX codec. 3 The core idea of the algorithm is to attenuate the speech segments in the signal with spectral energy below certain thresholds, which are obtained by computing the mean power on each frequency band from the STFT of a noise profile extracted from a silence region of the speech signal.…”
Section: Preprocessingmentioning
confidence: 99%
“…After the convolution operation, the resulting feature maps contain low-and high-level features representing the acoustic information of the signals. Many works have shown the advantages of using CNNs and spectrograms in different speech processing applications such as automatic detection of disordered speech [2][3][4], acoustic models for automatic speech recognition systems [5,6], and emotion detection [7], among others. These studies, however, consider single-channel spectrograms to obtain the feature maps, e.g., the shorttime Fourier transforms (STFT) are applied to the audio signal and the resulting spectrogram is used as an input to the model.…”
Section: Introductionmentioning
confidence: 99%