There exist many acoustic parameters employed for pathological assessment tasks, which have served as tools for clinicians to distinguish between normophonic and pathological voices. However, many of these parameters require an appropriate tuning in order to maximize its efficiency. In this work, a group of new and already proposed modulation spectrum (MS) metrics are optimized considering different time and frequency ranges pursuing the maximization of efficiency for the detection of pathological voices. The optimization of the metrics is performed simultaneously in two different voice databases in order to identify what tuning ranges produce a better generalization. The experiments were cross-validated so as to ensure the validity of the results. A third database is used to test the optimized metrics. In spite of some differences, results indicate that the behavior of the metrics in the optimization process follows similar tendencies for the tuning databases, confirming the generalization capabilities of the proposed MS metrics. In addition, the tuning process reveals which bands of the modulation spectra have relevant information for each metric, which has a physical interpretation respecting the phonatory system. Efficiency values up to 90.6% are obtained in one tuning database, while in the other, the maximum efficiency reaches 71.1%. Obtained results also evidence a separability between normophonic and pathological states using the proposed metrics, which can be exploited for voice pathology detection or assessment.
Literature documents the impact of Parkinson’s Disease (PD) on speech but no study has analyzed in detail the importance of the distinct phonemic groups for the automatic identification of the disease. This study presents new approaches that are evaluated in three different corpora containing speakers suffering from PD with two main objectives: to investigate the influence of the different phonemic groups in the detection of PD and to propose more accurate detection schemes employing speech. The proposed methodology uses GMM-UBM classifiers combined with a technique introduced in this paper called phonemic grouping, that permits observation of the differences in accuracy depending on the manner of articulation. Cross-validation results reach accuracies between 85% and 94% with AUC ranging from 0.91 to 0.98, while cross-corpora trials yield accuracies between 75% and 82% with AUC between 0.84 and 0.95, depending on the corpus. This is the first work analyzing the generalization properties of the proposed approaches employing cross-corpora trials and reaching high accuracies. Among the different phonemic groups, results suggest that plosives, vowels and fricatives are the most relevant acoustic segments for the detection of PD with the proposed schemes. In addition, the use of text-dependent utterances leads to more consistent and accurate models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.