Learning Language and Acoustic Models for Identifying Alzheimer’s Dementia From Speech

Shah, Zehra; Sawalha, Jeffrey; Tasnim, Mashrura; Qi, Shi-ang; Stroulia, Eleni; Greiner, Russell

doi:10.3389/fcomp.2021.624659

Cited by 27 publications

(30 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Regarding BERT + ViT + Co-Attention, it improves the RMSE scores of all the existing research initiatives, except Bimodal Network (Ensembled Output) (Koo et al, 2020 ), by 0.14–1.41. In terms of the Multimodal BERT - eGeMAPS, Multimodal BERT - ViT, and Multimodal BERT - eGeMAPS + ViT, it seems that these architectures are rather complex for the MMSE regression task improving the RMSE score of only one research work (Shah et al, 2021 ).…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Multimodal Deep Learning Models for Detecting Dementia From Speech and Transcripts

Ilias

Askounis

2022

Front. Aging Neurosci.

View full text Add to dashboard Cite

Alzheimer's dementia (AD) entails negative psychological, social, and economic consequences not only for the patients but also for their families, relatives, and society in general. Despite the significance of this phenomenon and the importance for an early diagnosis, there are still limitations. Specifically, the main limitation is pertinent to the way the modalities of speech and transcripts are combined in a single neural network. Existing research works add/concatenate the image and text representations, employ majority voting approaches or average the predictions after training many textual and speech models separately. To address these limitations, in this article we present some new methods to detect AD patients and predict the Mini-Mental State Examination (MMSE) scores in an end-to-end trainable manner consisting of a combination of BERT, Vision Transformer, Co-Attention, Multimodal Shifting Gate, and a variant of the self-attention mechanism. Specifically, we convert audio to Log-Mel spectrograms, their delta, and delta-delta (acceleration values). First, we pass each transcript and image through a BERT model and Vision Transformer, respectively, adding a co-attention layer at the top, which generates image and word attention simultaneously. Secondly, we propose an architecture, which integrates multimodal information to a BERT model via a Multimodal Shifting Gate. Finally, we introduce an approach to capture both the inter- and intra-modal interactions by concatenating the textual and visual representations and utilizing a self-attention mechanism, which includes a gate model. Experiments conducted on the ADReSS Challenge dataset indicate that our introduced models demonstrate valuable advantages over existing research initiatives achieving competitive results in both the AD classification and MMSE regression task. Specifically, our best performing model attains an accuracy of 90.00% and a Root Mean Squared Error (RMSE) of 3.61 in the AD classification task and MMSE regression task, respectively, achieving a new state-of-the-art performance in the MMSE regression task.

show abstract

Section: Resultsmentioning

confidence: 99%

“…Shah et al ( 2021 ) used also an ensemble method to predict AD patients. Specifically, after training acoustic and language models, they chose the three best performing acoustic models and the best performing language model.…”

Section: Related Workmentioning

confidence: 99%

Multimodal Deep Learning Models for Detecting Dementia From Speech and Transcripts

Ilias

Askounis

2022

Front. Aging Neurosci.

View full text Add to dashboard Cite

show abstract

“…Table 3 shows the ranked features from both feature sets, together with their Pearson's correlation (r) with the diagnosis class; due to space limitations, we only show the top 10 features. 2 The most significant acoustic feature was LogHNR, known to be important in acoustic analysis for the diagnosis of pathological voices; loudness, raw fundamental frequency, variation in jitter, intensity, and LogHNR all positively correlate with AD and have been reported as useful features in literature for Dementia [31,9]. Among interactional features, lapses are positively correlated with AD, indicating that patients find trouble continuing topics, resulting in delays with interviewers initiating a new topic.…”

Section: Resultsmentioning

confidence: 99%

“…Its highest incidence is among adults due to age as a risk factor: one in every six individuals over the age of 80 is likely to develop AD and the number of cases over the age of 60 is doubling every 45 years [1]. Early recognition of cognitive decline could be helpful in managing pre-stage AD thus allowing better quality of life for elderly patients and their caregivers [2].…”

Section: Introductionmentioning

confidence: 99%

“…Some work has taken a multimodal approach to AD classification: Campbell et al [14] examined two fusion strategies with linguistic features and acoustic features, achieving 75% accuracy. Shah et al [2] used a weighted majority-vote ensemble algorithm for classification and chose the best performing language model with the three best performing acoustic models, giving final prediction accuracy of 83%.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Detecting Alzheimer’s Disease Using Interactional and Acoustic Features from Spontaneous Speech

Nasreen¹,

Hough²,

Purver³

2021

Interspeech 2021

View full text Add to dashboard Cite

Alzheimer's Disease (AD) is a form of Dementia that manifests in cognitive decline including memory, language, and changes in behavior. Speech data has proven valuable for inferring cognitive status, used in many health assessment tasks, and can be easily elicited in natural settings. Much work focuses on analysis using linguistic features; here, we focus on non-linguistic features and their use in distinguishing AD patients from similar-age Non-AD patients with other health conditions in the Carolinas Conversation Collection (CCC) dataset. We used two types of features: patterns of interaction including pausing behaviour and floor control, and acoustic features including pitch, amplitude, energy, and cepstral coefficients. Fusion of the two kinds of features, combined with feature selection, obtains very promising classification results: classification accuracy of 90% using standard models such as support vector machines and logistic regression. We also obtain promising results using interactional features alone (87% accuracy), which can be easily extracted from natural conversations in daily life and thus have the potential for future implementation as a noninvasive method for AD diagnosis and monitoring.

show abstract

AI‐based assessments of speech and language impairments in dementia

Parsapoor

2023

Alzheimer's & Dementia

View full text Add to dashboard Cite

Recent advancements in the artificial intelligence (AI) domain have revolutionized the early detection of cognitive impairments associated with dementia. This has motivated clinicians to use AI‐powered dementia detection systems, particularly systems developed based on individuals' and patients' speech and language, for a quick and accurate identification of patients with dementia. This paper reviews articles about developing assessment tools using machine learning and deep learning algorithms trained by vocal and textual datasets.

show abstract

Learning Language and Acoustic Models for Identifying Alzheimer’s Dementia From Speech

Cited by 27 publications

References 19 publications

Multimodal Deep Learning Models for Detecting Dementia From Speech and Transcripts

Multimodal Deep Learning Models for Detecting Dementia From Speech and Transcripts

Detecting Alzheimer’s Disease Using Interactional and Acoustic Features from Spontaneous Speech

AI‐based assessments of speech and language impairments in dementia

Contact Info

Product

Resources

About

Learning Language and Acoustic Models for Identifying Alzheimer’s Dementia From Speech

Cited by 27 publications

References 19 publications

Multimodal Deep Learning Models for Detecting Dementia From Speech and Transcripts

Multimodal Deep Learning Models for Detecting Dementia From Speech and Transcripts

Detecting Alzheimer&#8217;s Disease Using Interactional and Acoustic Features from Spontaneous Speech

AI‐based assessments of speech and language impairments in dementia

Contact Info

Product

Resources

About

Detecting Alzheimer’s Disease Using Interactional and Acoustic Features from Spontaneous Speech