Automatically detecting pathological voice disorders such as vocal cord paralysis or Reinke's edema is a challenging and important medical classification problem. While deep learning techniques have achieved significant progress in the speech recognition field there has been less research work in the area of pathological voice disorders detection. A novel system for pathological voice detection using convolutional neural network (CNN) as the basic architecture is presented in this work. The novel system uses spectrograms of normal and pathological speech recordings as the input to the network. Initially Convolutional deep belief network (CDBN) are used to pre-train the weights of CNN system. This acts as a generative model to explore the structure of the input data using statistical methods. Then a CNN is trained using supervised back-propagation learning algorithm to fine tune the weights. It will be shown that a small amount of data can be used to achieve good results in classification with this deep learning approach. A performance analysis of the novel method is provided using real data from the Saarbrucken Voice database.
Acoustic analysis using signal processing tools can be used to extract voice features to distinguish whether a voice is pathological or healthy. The proposed work uses spectrogram of voice recordings from a voice database as the input to a Convolutional Neural Network (CNN) for automatic feature extraction and classification of disordered and normal voice. The novel classifier achieved 88.5%, 66.2% and 77.0% accuracy on training, validation and testing data set respectively on 482 normal and 482 organic dysphonia speech files. It reveals that the proposed novel algorithm on the Saarbruecken Voice Database can effectively been used for screening pathological voice recordings.
This paper presents an automatic detection of Dysarthria, a motor speech disorder, using extended speech features called Centroid Formants. Centroid Formants are the weighted averages of the formants extracted from a speech signal. This involves extraction of the first four formants of a speech signal and averaging their weighted values. The weights are determined by the peak energies of the bands of frequency resonance, formants. The resulting weighted averages are called the Centroid Formants. In our proposed methodology, these centroid formants are used to automatically detect Dysarthric speech using neural network classification technique. The experimental results recorded after testing this algorithm are presented. The experimental data consists of 200 speech samples from 10 Dysarthric speakers and 200 speech samples from 10 age-matched healthy speakers. The experimental results show a high performance using neural networks classification. A possible future research related to this work is the use of these extended features in speaker identification and recognition of disordered speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.