Using the Outputs of Different Automatic Speech Recognition Paradigms for Acoustic- and BERT-Based Alzheimer&amp;#8217;s Dementia Detection Through Spontaneous Speech

Pan, Yilin; Mirheidari, Bahman; Harris, Jennifer; Thompson, Jennifer C.; Jones, Matthew; Snowden, Julie S.; Blackburn, Daniel; Christensen, Heidi

doi:10.21437/interspeech.2021-1519

Cited by 27 publications

(27 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We benchmark our proposed GPT-3 embedding (Babbage) method against other state-of-the-art AD detection models. The existing methods include the studies from Luz et al [ 21 ], Balagopalan & Novikova [ 8 ] and Pan et al [ 31 ], which all used the ADReSSo Challenge data. The models selected are all trained based on the 10-fold CV and evaluated on the same unseen test set to ensure fair comparison.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Predicting dementia from spontaneous speech using large language models

Agbavor

Liang

2022

PLOS Digit Health

View full text Add to dashboard Cite

Language impairment is an important biomarker of neurodegenerative disorders such as Alzheimer’s disease (AD). Artificial intelligence (AI), particularly natural language processing (NLP), has recently been increasingly used for early prediction of AD through speech. Yet, relatively few studies exist on using large language models, especially GPT-3, to aid in the early diagnosis of dementia. In this work, we show for the first time that GPT-3 can be utilized to predict dementia from spontaneous speech. Specifically, we leverage the vast semantic knowledge encoded in the GPT-3 model to generate text embedding, a vector representation of the transcribed text from speech, that captures the semantic meaning of the input. We demonstrate that the text embedding can be reliably used to (1) distinguish individuals with AD from healthy controls, and (2) infer the subject’s cognitive testing score, both solely based on speech data. We further show that text embedding considerably outperforms the conventional acoustic feature-based approach and even performs competitively with prevailing fine-tuned models. Together, our results suggest that GPT-3 based text embedding is a viable approach for AD assessment directly from speech and has the potential to improve early diagnosis of dementia.

show abstract

Section: Resultsmentioning

confidence: 99%

“…The models selected are all trained based on the 10-fold CV and evaluated on the same unseen test set to ensure fair comparison. For example, we do not include Model 4 & 5 in Pan et al [ 31 ] as the models were trained by holding out 20% of the training set. Instead, we select the best model (Model 2), which was trained using 10-fold CV.…”

Section: Resultsmentioning

confidence: 99%

Predicting dementia from spontaneous speech using large language models

Agbavor

Liang

2022

PLOS Digit Health

View full text Add to dashboard Cite

show abstract

“…In addition, some other text-based pre-trained models work well. For example, the accuracies of BERT, part of BERT or BERT-based adaptation models [46,47,54,65] were between 81% and 84.51%. Except for the text-based pre-trained models, audio and image-based pre-trained models also have been explored in speechbased AD detection.…”

Section: Comparisons Of Methods For the Adress Challengementioning

confidence: 99%

Deep learning-based speech analysis for Alzheimer’s disease detection: a literature review

Yang¹,

Ding³

et al. 2022

Alz Res Therapy

View full text Add to dashboard Cite

Background Alzheimer’s disease has become one of the most common neurodegenerative diseases worldwide, which seriously affects the health of the elderly. Early detection and intervention are the most effective prevention methods currently. Compared with traditional detection methods such as traditional scale tests, electroencephalograms, and magnetic resonance imaging, speech analysis is more convenient for automatic large-scale Alzheimer’s disease detection and has attracted extensive attention from researchers. In particular, deep learning-based speech analysis and language processing techniques for Alzheimer’s disease detection have been studied and achieved impressive results. Methods To integrate the latest research progresses, hundreds of relevant papers from ACM, DBLP, IEEE, PubMed, Scopus, Web of Science electronic databases, and other sources were retrieved. We used these keywords for paper search: (Alzheimer OR dementia OR cognitive impairment) AND (speech OR voice OR audio) AND (deep learning OR neural network). Conclusions Fifty-two papers were finally retained after screening. We reviewed and presented the speech databases, deep learning methods, and model performances of these studies. In the end, we pointed out the mainstreams and limitations in the current studies and provided a direction for future research.

show abstract

“…based assistive technologies more natural alternatives [23], [24] even though speech quality is degraded. To this end, in recent years there has been increasing interest in developing ASR technologies that are suitable for dysarthric [9], [25]- [40] and elderly speech [14], [41]- [46].…”

Section: Introductionmentioning

confidence: 99%

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

Jin¹,

Deng²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal speech, accurate recognition of dysarthric and elderly speech remains highly challenging tasks to date. It is difficult to collect large quantities of such data for ASR system development due to the mobility issues often found among these users. To this end, data augmentation techniques play a vital role. In contrast to existing data augmentation techniques only modifying the speaking rate or overall shape of spectral contour, fine-grained spectro-temporal differences between dysarthric, elderly and normal speech are modelled using a novel set of speaker dependent (SD) generative adversarial networks (GAN) based data augmentation approaches in this paper. These flexibly allow both: a) temporal or speed perturbed normal speech spectra to be modified and closer to those of an impaired speaker when parallel speech data is available; and b) for non-parallel data, the SVD decomposed normal speech spectral basis features to be transformed into those of a target elderly speaker before being re-composed with the temporal bases to produce the augmented data for state-of-the-art TDNN and Conformer ASR system training. Experiments are conducted on four tasks: the English UASpeech and TORGO dysarthric speech corpora; the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech datasets. The proposed GAN based data augmentation approaches consistently outperform the baseline speed perturbation method by up to 0.91% and 3.0% absolute (9.61% and 6.4% relative) WER reduction on the TORGO and DementiaBank data respectively. Consistent performance improvements are retained after applying LHUC based speaker adaptation.

show abstract

Using the Outputs of Different Automatic Speech Recognition Paradigms for Acoustic- and BERT-Based Alzheimer’s Dementia Detection Through Spontaneous Speech

Cited by 27 publications

References 23 publications

Predicting dementia from spontaneous speech using large language models

Predicting dementia from spontaneous speech using large language models

Deep learning-based speech analysis for Alzheimer’s disease detection: a literature review

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

Contact Info

Product

Resources

About