Voice diseases have been increasing dramatically in recent times due mainly to unhealthy social habits and voice abuse. These diseases must be diagnosed and treated at an early stage, especially in the case of larynx cancer. It is widely recognized that vocal and voice diseases do not necessarily cause changes in voice quality as perceived by a listener. Acoustic analysis could be a useful tool to diagnose this type of disease. Preliminary research has shown that the detection of voice alterations can be carried out by means of Gaussian mixture models and short-term mel cepstral parameters complemented by frame energy together with first and second derivatives. This paper, using the F-Ratio and Fisher's discriminant ratio, will demonstrate that the detection of voice impairments can be performed using both mel cesptral vectors and their first derivative, ignoring the second derivative.
It is well known that vocal and voice diseases do not necessarily cause perceptible changes in the acoustic voice signal. Acoustic analysis is a useful tool to diagnose voice diseases being a complementary technique to other methods based on direct observation of the vocal folds by laryngoscopy. Through the present paper two neural-network based classification approaches applied to the automatic detection of voice disorders will be studied. Structures studied are multilayer perceptron and learning vector quantization fed using short-term vectors calculated accordingly to the well-known Mel Frequency Coefficient cepstral parameterization. The paper shows that these architectures allow the detection of voice disorders--including glottic cancer--under highly reliable conditions. Within this context, the Learning Vector quantization methodology demonstrated to be more reliable than the multilayer perceptron architecture yielding 96% frame accuracy under similar working conditions.
This paper proposes a new approach to improve the amount of information extracted from the speech aiming to increase the accuracy of a system developed for the automatic detection of pathological voices. The paper addresses the discrimination capabilities of 11 features extracted using nonlinear analysis of time series. Two of these features are based on conventional nonlinear statistics (largest Lyapunov exponent and correlation dimension), two are based on recurrence and fractal-scaling analysis, and the remaining are based on different estimations of the entropy. Moreover, this paper uses a strategy based on combining classifiers for fusing the nonlinear analysis with the information provided by classic parameterization approaches found in the literature (noise parameters and mel-frequency cepstral coefficients). The classification was carried out in two steps using, first, a generative and, later, a discriminative approach. Combining both classifiers, the best accuracy obtained is 98.23% ± 0.001.
Abstract-In this paper, we propose to quantify the quality of the recorded voice through objective nonlinear measures. Quantification of speech signal quality has been traditionally carried out with linear techniques since the classical model of voice production is a linear approximation. Nevertheless, nonlinear behaviors in the voice production process have been shown. This paper studies the usefulness of six nonlinear chaotic measures based on nonlinear dynamics theory in the discrimination between two levels of voice quality: healthy and pathological. The studied measures are first-and second-order Rényi entropies, the correlation entropy and the correlation dimension. These measures were obtained from the speech signal in the phase-space domain. The values of the first minimum of mutual information function and Shannon entropy were also studied. Two databases were used to assess the usefulness of the measures: a multiquality database composed of four levels of voice quality (healthy voice and three levels of pathological voice); and a commercial database (MEEI Voice Disorders) composed of two levels of voice quality (healthy and pathological voices). A classifier based on standard neural networks was implemented in order to evaluate the measures proposed. Global success rates of 82.47% (multiquality database) and 99.69% (commercial database) were obtained.
A filter bank-based algorithm for ECG compression is developed. The proposed method utilises a nearly-perfect reconstruction cosine modulated filter bank to split the incoming signals into several subband signals that are then quantised through thresholding and Huffman encoded. The advantage of the proposed method is that the threshold is chosen so that the quality of the retrieved signal is guaranteed. It is shown that the compression ratio achieved is an improvement over those obtained by previously reported thresholding-based algorithms.Introduction: Several wavelet-based ECG compression methods have recently been developed that report good performance [1][2][3]. Among the reported algorithms, those based on wavelet coefficient thresholding have been shown to be efficient and yield high compression ratios (CRs) [3]. In this Letter, we present an easy to use and efficient ECG compression scheme based on an M-channel maximally decimated filter bank with a parallel structure [4] that guarantees the quality of the retrieved signal. This method is based on that originally reported in [5]. The proposed method incorporates several innovations, including the quantisation of the subband signal samples with less resolution than the original signal samples and entropy-coding by means of a Huffman coder. As a result, the algorithm performs significantly better than other approaches developed using similar techniques.Tests are carried out using the MIT-BIH Arrhythmia Database and the percentage root-mean-square difference (PRD) measurement criteria to evaluate the quality of the retrieved signal. Accordingly, let x[n] and x[n] be the original and the reconstructed signals. The PRD is then defined as:
This work presents a comparison of different approaches for the detection of murmurs from phonocardiographic signals. Taking into account the variability of the phonocardiographic signals induced by valve disorders, three families of features were analyzed: (a) time-varying & time-frequency features; (b) perceptual; and (c) fractal features. With the aim of improving the performance of the system, the accuracy of the system was tested using several combinations of the aforementioned families of parameters. In the second stage, the main components extracted from each family were combined together with the goal of improving the accuracy of the system. The contribution of each family of features extracted was evaluated by means of a simple k-nearest neighbors classifier, showing that fractal features provide the best accuracy (97.17%), followed by time-varying & time-frequency (95.28%), and perceptual features (88.7%). However, an accuracy around 94% can be reached just by using the two main features of the fractal family; therefore, considering the difficulties related to the automatic intrabeat segmentation needed for spectral and perceptual features, this scheme becomes an interesting alternative. The conclusion is that fractal type features were the most robust family of parameters (in the sense of accuracy vs. computational load) for the automatic detection of murmurs. This work was carried out using a database that contains 164 phonocardiographic recordings (81 normal and 83 records with murmurs). The database was segmented to extract 360 representative individual beats (180 per class).
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.