Voice diseases have been increasing dramatically in recent times due mainly to unhealthy social habits and voice abuse. These diseases must be diagnosed and treated at an early stage, especially in the case of larynx cancer. It is widely recognized that vocal and voice diseases do not necessarily cause changes in voice quality as perceived by a listener. Acoustic analysis could be a useful tool to diagnose this type of disease. Preliminary research has shown that the detection of voice alterations can be carried out by means of Gaussian mixture models and short-term mel cepstral parameters complemented by frame energy together with first and second derivatives. This paper, using the F-Ratio and Fisher's discriminant ratio, will demonstrate that the detection of voice impairments can be performed using both mel cesptral vectors and their first derivative, ignoring the second derivative.
It is well known that vocal and voice diseases do not necessarily cause perceptible changes in the acoustic voice signal. Acoustic analysis is a useful tool to diagnose voice diseases being a complementary technique to other methods based on direct observation of the vocal folds by laryngoscopy. Through the present paper two neural-network based classification approaches applied to the automatic detection of voice disorders will be studied. Structures studied are multilayer perceptron and learning vector quantization fed using short-term vectors calculated accordingly to the well-known Mel Frequency Coefficient cepstral parameterization. The paper shows that these architectures allow the detection of voice disorders--including glottic cancer--under highly reliable conditions. Within this context, the Learning Vector quantization methodology demonstrated to be more reliable than the multilayer perceptron architecture yielding 96% frame accuracy under similar working conditions.
This paper proposes a new approach to improve the amount of information extracted from the speech aiming to increase the accuracy of a system developed for the automatic detection of pathological voices. The paper addresses the discrimination capabilities of 11 features extracted using nonlinear analysis of time series. Two of these features are based on conventional nonlinear statistics (largest Lyapunov exponent and correlation dimension), two are based on recurrence and fractal-scaling analysis, and the remaining are based on different estimations of the entropy. Moreover, this paper uses a strategy based on combining classifiers for fusing the nonlinear analysis with the information provided by classic parameterization approaches found in the literature (noise parameters and mel-frequency cepstral coefficients). The classification was carried out in two steps using, first, a generative and, later, a discriminative approach. Combining both classifiers, the best accuracy obtained is 98.23% ± 0.001.
Abstract-In this paper, we propose to quantify the quality of the recorded voice through objective nonlinear measures. Quantification of speech signal quality has been traditionally carried out with linear techniques since the classical model of voice production is a linear approximation. Nevertheless, nonlinear behaviors in the voice production process have been shown. This paper studies the usefulness of six nonlinear chaotic measures based on nonlinear dynamics theory in the discrimination between two levels of voice quality: healthy and pathological. The studied measures are first-and second-order Rényi entropies, the correlation entropy and the correlation dimension. These measures were obtained from the speech signal in the phase-space domain. The values of the first minimum of mutual information function and Shannon entropy were also studied. Two databases were used to assess the usefulness of the measures: a multiquality database composed of four levels of voice quality (healthy voice and three levels of pathological voice); and a commercial database (MEEI Voice Disorders) composed of two levels of voice quality (healthy and pathological voices). A classifier based on standard neural networks was implemented in order to evaluate the measures proposed. Global success rates of 82.47% (multiquality database) and 99.69% (commercial database) were obtained.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.