This paper proposes a new approach to improve the amount of information extracted from the speech aiming to increase the accuracy of a system developed for the automatic detection of pathological voices. The paper addresses the discrimination capabilities of 11 features extracted using nonlinear analysis of time series. Two of these features are based on conventional nonlinear statistics (largest Lyapunov exponent and correlation dimension), two are based on recurrence and fractal-scaling analysis, and the remaining are based on different estimations of the entropy. Moreover, this paper uses a strategy based on combining classifiers for fusing the nonlinear analysis with the information provided by classic parameterization approaches found in the literature (noise parameters and mel-frequency cepstral coefficients). The classification was carried out in two steps using, first, a generative and, later, a discriminative approach. Combining both classifiers, the best accuracy obtained is 98.23% ± 0.001.
Mel-frequency cepstral coefficients (MFCC) have traditionally been used in speaker identification applications. Their use has been extended to speech quality assessment for clinical applications during the last few years. While the significance of such parameters for such an application may not seem clear at first thought, previous research has demonstrated their robustness and statistical significance and, at the same time, their close relationship with glottal noise measurements. This paper includes a review of this parameterization scheme and it analyzes its performance for voice analysis when patients are differentiated by sex. While it is of common use for establishing normative values for traditional voice descriptors (e.g. pitch, jitter, formants), differentiation by sex had not been tested yet for cepstral analysis of voice with clinical purposes. This paper shows that the automatic detection of laryngeal pathology on voice records based on MFCC can significantly improve its performance by means of this prior differentiation by sex.
The present work describes a new method for the automatic detection of the glottal space from laryngeal images obtained either with high speed or with conventional video cameras attached to a laryngoscope. The detection is based on the combination of several relevant techniques in the field of digital image processing. The image is segmented with a watershed transform followed by a región merging, while the final decisión is taken using a simple linear predictor. This scheme has successfully segmented the glottal space in all the test images used.The method presented can be considered a generalist approach for the segmentation of the glottal space because, in contrast with other methods found in literature, this approach does not need either initialization or finding strict environmental conditions extracted from the images to be processed. Therefore, the main advantage is that the user does not have to outline the región of interest with a mouse click. In any case, some a priori knowledge about the glottal space is needed, but this a priorí knowledge can be considered weak compared to the environmental conditions fixed in former works.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.