Conformer Based Elderly Speech Recognition System for Alzheimer’s Disease Detection

Wang, Tianzi; Deng, Jiajun; Ye, Zi; Hu, Shoukang; Wang, Yi; Cui, Meng; Jin, Zengrui; Liu, Xunying; Meng, Helen

doi:10.21437/interspeech.2022-712

Cited by 11 publications

(6 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is noticed that when using the ground truth transcripts rather than ASR outputs, a comparable or worse performance was obtained with a F 1 score of 87%. Wang et al ( 2022a ) employed ASR optimization using neural architecture search, cross-domain adaptation and fine-grained elderly speaker adaptation and multi-pass rescoring based system combination with hybrid TDNN.…”

Section: Methodsmentioning

confidence: 99%

“…For example, Sarawgi et al ( 2020 ) extracted three diverse features and used model fusion strategies, resulting in an accuracy of 88% on Pitt dataset and 83.3% on the ADReSS dataset. Wang et al ( 2022a ) employed ASR optimization and model fusion strategies based on BERT and RoBERTa features. As a result, the paper achieved state-of-the-art performance with a F 1 score of 92% on the Pitt dataset.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Noninvasive automatic detection of Alzheimer's disease from spontaneous speech: a review

Qi,

Zhou,

Dong

et al. 2023

Front. Aging Neurosci.

View full text Add to dashboard Cite

Alzheimer's disease (AD) is considered as one of the leading causes of death among people over the age of 70 that is characterized by memory degradation and language impairment. Due to language dysfunction observed in individuals with AD patients, the speech-based methods offer non-invasive, convenient, and cost-effective solutions for the automatic detection of AD. This paper systematically reviews the technologies to detect the onset of AD from spontaneous speech, including data collection, feature extraction and classification. First the paper formulates the task of automatic detection of AD and describes the process of data collection. Then, feature extractors from speech data and transcripts are reviewed, which mainly contains acoustic features from speech and linguistic features from text. Especially, general handcrafted features and deep embedding features are organized from different modalities. Additionally, this paper summarizes optimization strategies for AD detection systems. Finally, the paper addresses challenges related to data size, model explainability, reliability and multimodality fusion, and discusses potential research directions based on these challenges.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Noninvasive automatic detection of Alzheimer's disease from spontaneous speech: a review

Qi,

Zhou,

Dong

et al. 2023

Front. Aging Neurosci.

View full text Add to dashboard Cite

show abstract

“…We conduct our experiments with either ground truth manual transcripts or transcripts generated by ASR system from audios. ASR systems: The experimental results in [29] suggest that the transcripts generated by the adapted hybrid CNN-TDNN ASR system [24] achieve better AD detection performance than those obtained from the adapted E2E Conformer model [38]. Hence, the hybrid CNN-TDNN ASR system is used.…”

Section: Text Datamentioning

confidence: 99%

Exploiting prompt learning with pre-trained language models for Alzheimer's Disease detection

Wang¹,

Deng²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care and to delay further progression. Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques. Textual embedding features produced by pre-trained language models (PLMs) such as BERT are widely used in such systems. However, PLM domain fine-tuning is commonly based on the masked word or sentence prediction costs that are inconsistent with the back-end AD detection task. To this end, this paper investigates the use of prompt-based fine-tuning of PLMs that consistently uses AD classification errors as the training objective function. Disfluency features based on hesitation or pause filler token frequencies are further incorporated into prompt phrases during PLM fine-tuning. The exploit of the complementarity between BERT or RoBERTa based PLMs that are either prompt learning fine-tuned, or optimized using the conventional masked word or sentence prediction costs, decision voting based system combination between them is further applied. Mean, standard deviation (std) and the maximum among accuracy scores over 15 experiment runs are adopted as performance measurements for the AD detection system. Mean detection accuracy of 84.20% (with std 2.09%, best 87.5%) and 82.64% (with std 4.0%, best 89.58%) were obtained using manual and ASR speech transcripts respectively on the ADReSS20 test set consisting of 48 elderly speakers.

show abstract

“…The wav2vec 2.0 models have also been used, for example, for detection of aphasia [22], for detection of stuttering [23], and for speech rating of disordered children's speech [24]. Various pre-training approaches have been used to detect Alzheimer's disease [25], [26], and heart failure [27]. However, only a few studies have applied these techniques on multi-class classification of voice disorders.…”

Section: Introductionmentioning

confidence: 99%

Hierarchical Multi-Class Classification of Voice Disorders Using Self-Supervised Models and Glottal Features

Tirronen

Kadiri

Alku

2023

IEEE Open J. Signal Process.

View full text Add to dashboard Cite

Previous studies on the automatic classification of voice disorders have mostly investigated the binary classification task, which aims to distinguish pathological voice from healthy voice. Using multi-class classifiers, however, more fine-grained identification of voice disorders can be achieved, which is more helpful for clinical practitioners. Unfortunately, there is little publicly available training data for many voice disorders, which lowers the classification performance on data from unseen speakers. Earlier studies have shown that the usage of glottal source features can reduce data redundancy in detection of laryngeal voice disorders. Another approach to tackle the problems caused by scarcity of training data is to utilize deep learning models, such as wav2vec 2.0 and HuBERT, that have been pre-trained on larger databases. Since the aforementioned approaches have not been thoroughly studied in the multi-class classification of voice disorders, they will be jointly studied in the present work. In addition, we study a hierarchical classifier, which enables task-wise feature optimization and more efficient utilization of data. In this work, the aforementioned three approaches are compared with traditional mel frequency cepstral coefficient (MFCC) features and one-vs-rest and one-vs-one SVM classifiers. The results in a 3-class classification problem between healthy voice and two laryngeal disorders (hyperfunctional dysphonia and vocal fold paresis) indicate that all the studied methods outperform the baselines. The best performance was achieved by using features from wav2vec 2.0 LARGE together with hierarchical classification. The balanced classification accuracy of the system was 62.77% for male speakers, and 55.36% for female speakers, which outperformed the baseline systems by an absolute improvement of 15.76% and 6.95% for male and female speakers, respectively.

show abstract

Conformer Based Elderly Speech Recognition System for Alzheimer’s Disease Detection

Cited by 11 publications

References 0 publications

Noninvasive automatic detection of Alzheimer's disease from spontaneous speech: a review

Noninvasive automatic detection of Alzheimer's disease from spontaneous speech: a review

Exploiting prompt learning with pre-trained language models for Alzheimer's Disease detection

Hierarchical Multi-Class Classification of Voice Disorders Using Self-Supervised Models and Glottal Features

Contact Info

Product

Resources

About