Automatic Speech Assessment for Aphasic Patients Based on Syllable-Level Embedding and Supra-Segmental Duration Features

Qin, Ying; Lee, Tan; Kong, Anthony Pak‐Hin

doi:10.1109/icassp.2018.8461289

Cited by 16 publications

(22 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If the overall score is higher than 0.5, the test speaker is classified as High-AQ, otherwise the speaker is classified as Low-AQ. The baseline assessment system in this study follows a conventional twostep assessment approach proposed in our previous study [14]. A 5-dimensional feature vector of supra-segmental duration features is evaluated on the same task of binary classification using a random forest classifier.…”

Section: Speaker-level Classification Accuracymentioning

confidence: 99%

“…The timedelay layers stacked with bidirectional long short term memory layers (TDNN-BLSTM) are used as acoustic model of the ASR system and it is trained using multi-task learning strategy [15]. These ASR-generated features were shown to be effective to classify High-AQ speakers from Low-AQ ones in the aspect of acoustic impairment of PWA speech [14]. Table 6 lists the speaker-level binary classification results on 91 test speakers.…”

Section: Speaker-level Classification Accuracymentioning

confidence: 99%

See 1 more Smart Citation

An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

Qin

Lee

et al. 2020

J Sign Process Syst

Self Cite

View full text Add to dashboard Cite

Conventional automatic assessment of pathological speech usually follows two main steps: (1) extraction of pathology-specific features; (2) classification or regression on extracted features. Given the great variety of speech and language disorders, feature design is never a straightforward task, and yet it is most crucial to the performance of assessment. This paper presents an end-to-end approach to automatic speech assessment for Cantonese-speaking People With Aphasia (PWA). The assessment is formulated as a binary classification task to discriminate PWA with high scores of subjective assessment from those with low scores. The sequence-to-one Recurrent Neural Network with Gated Recurrent Unit (GRU-RNN) and Convolutional Neural Network (CNN) models are applied to realize the end-to-end mapping from fundamental speech features to the classification result. The pathology-specific features used for assessment can be learned implicitly by the neural network model. Class Activation Mapping (CAM) method is utilized to visualize how those features contribute to the assessment result. Our experimental results 2 Ying Qin et al.show that the end-to-end approach outperforms the conventional two-step approach in the classification task, and confirm that the CNN model is able to learn impairment-related features that are similar to human-designed features. The experimental results also suggest that CNN model performs better than sequence-to-one GRU-RNN model in this specific task.

show abstract

Section: Speaker-level Classification Accuracymentioning

confidence: 99%

Section: Speaker-level Classification Accuracymentioning

confidence: 99%

An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

Qin

Lee

et al. 2020

J Sign Process Syst

Self Cite

View full text Add to dashboard Cite

show abstract

“…In our previous study [5], a standard DNN based ASR for assessment was trained with limited domain-matched healthy speech. MTL provides a potential way to use the domainmismatched datasets to tackle the data scarcity problem.…”

Section: Asr System For Aphasia Assessment 31 Mt-tdnn-blstm Modelmentioning

confidence: 99%

“…In our previous study [5], a framework of fully automatic speech assessment for Cantonese-speaking PWA was developed. A domain-matched ASR system trained with unimpaired speech was used to decode PWA speech into syllable sequences with time alignment information.…”

Section: Introductionmentioning

confidence: 99%

“…A common problem in developing of ASR systems for atypical speech, including the PWA speech, is due to the lack of training data that are appropriate in terms of spoken content, speaking style, etc. In [5,6], although a limited amount of domain-matched unimpaired speech were available, they were not sufficient to support the use of most advanced deep learning techniques. In [7] and [8], tandem feature and discriminative pretraining based out-of-domain adaptation methods were applied to improve the ASR performance on a smaller impaired speech corpus.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning

Qin¹,

Lee²,

Feng³

et al. 2018

Interspeech 2018

Self Cite

View full text Add to dashboard Cite

This paper describes an investigation on automatic speech assessment for people with aphasia (PWA) using a DNN based automatic speech recognition (ASR) system. The main problems being addressed are the lack of training speech in the intended application domain and the relevant degradation of ASR performance for impaired speech of PWA. We adopt the TDNN-BLSTM structure for acoustic modeling and apply the technique of multi-task learning with large amount of domainmismatched data. This leads to a significant improvement on the recognition accuracy, as compared with a conventional single-task learning DNN system. To facilitate the extraction of robust text features for quantifying language impairment in PWA speech, we propose to incorporate N-best hypotheses and confusion network representation of the ASR output. The severity of impairment is predicted from text features and suprasegmental duration features using different regression models. Experimental results show a high correlation of 0.842 between the predicted severity level and the subjective Aphasia Quotient score.

show abstract

Role of pause duration in primary progressive aphasia

Ossewaarde,

Pijnenburg,

Keulen

et al. 2024

Aphasiology

View full text Add to dashboard Cite

Automatic Speech Assessment for Aphasic Patients Based on Syllable-Level Embedding and Supra-Segmental Duration Features

Cited by 16 publications

References 6 publications

An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning

Role of pause duration in primary progressive aphasia

Contact Info

Product

Resources

About