Parkinson's disease is a neurodegenerative disorder characterized by a variety of motor symptoms. Particularly, difficulties to start/stop movements have been observed in patients. From a technical/diagnostic point of view, these movement changes can be assessed by modeling the transitions between voiced and unvoiced segments in speech, the movement when the patient starts or stops a new stroke in handwriting, or the movement when the patient starts or stops the walking process. This study proposes a methodology to model such difficulties to start or to stop movements considering information from speech, handwriting, and gait. We used those transitions to train convolutional neural networks to classify patients and healthy subjects. The neurological state of the patients was also evaluated according to different stages of the disease (initial, intermediate, and advanced). In addition, we evaluated the robustness of the proposed approach when considering speech signals in three different languages: Spanish, German, and Czech. According to the results, the fusion of information from the three modalities is highly accurate to classify patients and healthy subjects, and it shows to be suitable to assess the neurological state of the patients in several stages of the disease. We also aimed to interpret the feature maps obtained from the deep learning architectures with respect to the presence or absence of the disease and the neurological state of the patients. As far as we know, this is one of the first works that considers multimodal information to assess Parkinson's disease following a deep learning approach.
Change in voice quality (VQ) is one of the first precursors of Parkinson's disease (PD). Specifically, impacted phonation and articulation causes the patient to have a breathy, husky-semiwhisper and hoarse voice.A goal of this paper is to characterize a VQ spectrum -the composition of non-modal phonations -of voice in PD. The paper relates non-modal healthy phonations: breathy, creaky, tense, falsetto and harsh, with disordered phonation in PD. First, statistics are learned to differentiate the modal and non-modal phonations. Statistics are computed using phonological posteriors, the probabilities of phonological features inferred from the speech signal using a deep learning approach. Second, statistics of disordered speech are learned from PD speech data comprising 50 patients and 50 healthy controls. Third, Euclidean distance is used to calculate similarity of non-modal and disordered statistics, and the inverse of the distances is used to obtain the composition of non-modal phonation in PD. Thus, pathological voice quality is characterised using healthy non-modal voice quality "base/eigenspace". The obtained results are interpreted as the voice of an average patient with PD and can be characterised by the voice quality spectrum composed of 30% breathy voice, 23% creaky voice, 20% tense voice, 15% falsetto voice and 12% harsh voice. In addition, the proposed features were applied for prediction * Corresponding author Email address: milos.cernak@idiap.ch (Milos Cernak) of the dysarthria level according to the Frenchay assessment score related to the larynx, and significant improvement is obtained for reading speech task. The proposed characterisation of VQ might also be applied to other kinds of pathological speech.
The m-FDA scale was introduced to assess the dysarthria level of patients with PD. Articulation features extracted from continuous speech signals to create i-vectors were the most accurate to quantify the dysarthria level, with correlations of up to 0.69 between the predicted m-FDA scores and those assigned by the phoniatricians. When the dysarthria levels were estimated considering dedicated speech exercises such as rapid repetition of syllables (DDKs) and read texts, the correlations were 0.64 and 0.57, respectively. In addition, the combination of several feature sets and speech tasks improved the results, which validates the hypothesis about the contribution of information from different tasks and feature sets when assessing dysarthric speech signals. The speaker models seem to be promising to perform individual modeling for monitoring the dysarthria level of PD patients. The proposed approach may help clinicians to make more accurate and timely decisions about the evaluation and therapy associated to the dysarthria level of patients. The proposed approach is a great step towards unobtrusive/ecological evaluations of patients with dysarthric speech without the need of attending medical appointments.
Information from different bio-signals such as speech, handwriting, and gait have been used to monitor the state of Parkinson's disease (PD) patients, however, all the multimodal bio-signals may not always be available. We propose a method based on multi-view representation learning via generalized canonical correlation analysis (GCCA) for learning a representation of features extracted from handwriting and gait that can be used as a complement to speech-based features. Three different problems are addressed: classification of PD patients vs. healthy controls, prediction of the neurological state of PD patients according to the UP-DRS score, and the prediction of a modified version of the Frenchay dysarthria assessment (m-FDA). According to the results, the proposed approach is suitable to improve the results in the addressed problems, specially in the prediction of the UPDRS, and m-FDA scores.
Speech impairments are one of the earliest manifestations in patients with Parkinson's disease. Particularly, articulation deficits related to the capability of the speaker to start/stop the vibration of the vocal folds have been observed in the patients. Those difficulties can be assessed by modeling the transitions between voiced and unvoiced segments from speech. A robust strategy to model the articulatory deficits related to the starting or stopping vibration of the vocal folds is proposed in this study. The transitions between voiced and unvoiced segments are modeled by a convolutional neural network that extracts suitable information from two time-frequency representations: the short time Fourier transform and the continuous wavelet transform. The proposed approach improves the results previously reported in the literature. Accuracies of up to 89% are obtained for the classification of Parkinson's patients vs. healthy speakers. This study is a step towards the robust modeling of the speech impairments in patients with neuro-degenerative disorders.
There are a lot of features that can be extracted from speech signals for different applications such as automatic speech recognition or speaker verification. However, for pathological speech processing there is a need to extract features about the presence of the disease or the state of the patients that are comprehensible for clinical experts. Phonological posteriors are a group of features that can be interpretable by the clinicians and at the same time carry suitable information about the patient's speech. This paper presents a tool to extract phonological posteriors directly from speech signals. The proposed method consists of a bank of parallel bidirectional recurrent neural networks to estimate the posterior probabilities of the occurrence of different phonological classes. The proposed models are able to detect the phonological classes with accuracies over 90%. In addition, the trained models are available to be used by the research community interested in the topic.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.