Background Speech data for medical research can be collected noninvasively and in large volumes. Speech analysis has shown promise in diagnosing neurodegenerative disease. To effectively leverage speech data, transcription is important, as there is valuable information contained in lexical content. Manual transcription, while highly accurate, limits the potential scalability and cost savings associated with language-based screening. Objective To better understand the use of automatic transcription for classification of neurodegenerative disease, namely, Alzheimer disease (AD), mild cognitive impairment (MCI), or subjective memory complaints (SMC) versus healthy controls, we compared automatically generated transcripts against transcripts that went through manual correction. Methods We recruited individuals from a memory clinic (“patients”) with a diagnosis of mild-to-moderate AD, (n=44, 30%), MCI (n=20, 13%), SMC (n=8, 5%), as well as healthy controls (n=77, 52%) living in the community. Participants were asked to describe a standardized picture, read a paragraph, and recall a pleasant life experience. We compared transcripts generated using Google speech-to-text software to manually verified transcripts by examining transcription confidence scores, transcription error rates, and machine learning classification accuracy. For the classification tasks, logistic regression, Gaussian naive Bayes, and random forests were used. Results The transcription software showed higher confidence scores (P<.001) and lower error rates (P>.05) for speech from healthy controls compared with patients. Classification models using human-verified transcripts significantly (P<.001) outperformed automatically generated transcript models for both spontaneous speech tasks. This comparison showed no difference in the reading task. Manually adding pauses to transcripts had no impact on classification performance. However, manually correcting both spontaneous speech tasks led to significantly higher performances in the machine learning models. Conclusions We found that automatically transcribed speech data could be used to distinguish patients with a diagnosis of AD, MCI, or SMC from controls. We recommend a human verification step to improve the performance of automatic transcripts, especially for spontaneous tasks. Moreover, human verification can focus on correcting errors and adding punctuation to transcripts. However, manual addition of pauses is not needed, which can simplify the human verification step to more efficiently process large volumes of speech data.
Alzheimer’s disease (AD) is a progressive neurodegenerative condition that results in impaired performance in multiple cognitive domains. Preclinical changes in eye movements and language can occur with the disease, and progress alongside worsening cognition. In this article, we present the results from a machine learning analysis of a novel multimodal dataset for AD classification. The cohort includes data from two novel tasks not previously assessed in classification models for AD (pupil fixation and description of a pleasant past experience), as well as two established tasks (picture description and paragraph reading). Our dataset includes language and eye movement data from 79 memory clinic patients with diagnoses of mild-moderate AD, mild cognitive impairment (MCI), or subjective memory complaints (SMC), and 83 older adult controls. The analysis of the individual novel tasks showed similar classification accuracy when compared to established tasks, demonstrating their discriminative ability for memory clinic patients. Fusing the multimodal data across tasks yielded the highest overall AUC of 0.83 ± 0.01, indicating that the data from novel tasks are complementary to established tasks.
Background Clinical trials of disease‐modifying therapies for Alzheimer’s disease (AD) are increasingly focused on recruiting individuals with preclinical or early‐stage disease. Artificial intelligence may help in enriching clinical trial populations with high‐risk individuals. We analyzed prospectively‐collected speech and eye‐tracking data to distinguish individuals with mild‐moderate AD, mild cognitive impairment (MCI), and subjective memory complaints (SMC) from age and sex‐matched healthy volunteers. Method Individuals with known clinical diagnoses of AD, MCI, and SMC from a specialty clinic, and healthy controls from the community, were prospectively recruited. Participants described the “Cookie Theft Picture” from the Boston Aphasia Battery. Their speech was recorded and manually transcribed and eye movements were assessed using an infrared eye‐tracker. Data underwent feature extraction for language‐related features including lexical and acoustic parameters (from transcripts and speech), and fixation and saccades (from eye‐tracking). Additionally, for language and eye‐tracking, features capturing spatial neglect are explored following the approach in [1]. Separate predictive models were examined using logistic regression (LR), K‐Nearest Neighbours (KNN), and random forests (RF). Result Recruitment and analysis are ongoing. Data from 34 clinic patients (9% SMC, 35% MCI, 56% mild‐moderate AD, mean age 7310, 44% female) and 39 controls (mean age 6811, 79% female) is reported. Best speech‐based models distinguished patients from controls with an Area Under the ROC Curve (AUC) of 79% (95% CI 0.70‐0.87). Eye‐tracking‐based models reached distinguished patients from controls with an AUC of 71% (95% CI 0.59‐0.82). Conclusion Machine‐learning mediated analysis of speech and eye‐tracking data achieved promising classification performance with the current cohort. Both increasing the size of the corpus with ongoing recruitment, and combining demographic and clinical data alongside multimodal feature fusion, may help to improve classification.
Background Clinical trials investigating novel disease‐modifying therapies for Alzheimer’s disease (AD) are increasingly targeting participants with preclinical or early‐stage neurodegeneration. Artificial intelligence may improve ascertainment of these individuals and thus enrich clinical trial cohorts. We examined classification accuracy of machine learning analysis of speech and gaze data to distinguish memory clinic patients from controls. Method We recruited individuals with a clinical diagnosis of AD, mild cognitive impairment (MCI), and subjective memory complaints (SMC) from a subspecialty memory clinic, and controls from the community. Clinical diagnosis was ascertained by trained Behavioural Neurologists aided by cognitive tests and neuroimaging. Participants read a paragraph from the International Reading Speed Texts (IReST). Speech was recorded, automatically transcribed, and manually verified. Eye movements were recorded with an infrared eye‐tracker. Features extracted included lexical and acoustic parameters for speech, fixation and saccade‐related features from gaze, and novel multimodal features leveraging speech and gaze signals in combination. We explored predictive models combining using logistic regression, Gaussian Naïve Bayes classifiers, and Random Forests. Result Here we report baseline IReST task data from 60 clinic patients (12% SMC, 30% MCI, 58% AD, mean age 73 ± 9, 52% female) and 66 controls (mean age 65 ± 9, 68% female). Best speech‐based models distinguished patients from controls with an Area Under the ROC Curve (AUC) of 0.75 (95% CI 0.72‐0.78). Best gaze models yielded an AUC of 0.73 (0.71‐0.75). Models integrating speech and gaze data yielded best results with AUC 0.78 (0.76‐0.80). We have previously reported data from a separate task where participants describe the Cookie Theft photo from the Boston Aphasia Battery while undergoing infrared eye‐tracking; the best AUC model combining speech and gaze was 0.80 (0.78 ‐ 0.92). Conclusion Machine‐learning based analysis of speech and gaze data demonstrates promising classification accuracy in distinguishing memory clinic patients from healthy controls, particularly when leveraging speech and gaze data in combination. We will explore the additional classification accuracy of combining data from multiple tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.