Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning

Qin, Ying; Lee, Tan; Feng, Siyuan; Kong, Anthony Pak‐Hin

doi:10.21437/interspeech.2018-1630

Cited by 24 publications

(17 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Automatic assessment of pathological speech has also been researched, but, in general, the studies on the topic are related to specific aspects and populations. Some works focus on the speech intelligibility of people with aphasia [23,24] or speech intelligibility in pathological voices [25,26]. Others try to identify speech disorders in children with cleft lip and palate [27] or to predict automatically some dysarthric speech evaluation metrics, such as intelligibility, severity and articulation impairment [28,29].…”

Section: Introductionmentioning

confidence: 99%

Automatic Assessment of Prosodic Quality in Down Syndrome: Analysis of the Impact of Speaker Heterogeneity

et al. 2019

View full text Add to dashboard Cite

Prosody is a fundamental speech element responsible for communicative functions such as intonation, accent and phrasing, and prosodic impairments of individuals with intellectual disabilities reduce their communication skills. Yet, technological resources have paid little attention to prosody. This study aims to develop an automatic classifier to predict the prosodic quality of utterances produced by individuals with Down syndrome, and to analyse how inter-individual heterogeneity affects assessment results. A therapist and an expert in prosody judged the prosodic appropriateness of a corpus of Down syndrome' utterances collected through a video game. The judgments of the expert were used to train an automatic classifier that predicts prosodic quality by using a set of fundamental frequency, duration and intensity features. The classifier accuracy was 79.3% and its true positive rate 89.9%. We analyzed how informative each of the features was for the assessment and studied relationships between participants' developmental level and results: interspeaker variability conditioned the relative weight of prosodic features for automatic classification and participants' developmental level was related to the prosodic quality of their productions. Therefore, since speaker variability is an intrinsic feature of individuals with Down syndrome, it should be considered to attain an effective automatic prosodic assessment system.

show abstract

Section: Introductionmentioning

confidence: 99%

Automatic Assessment of Prosodic Quality in Down Syndrome: Analysis of the Impact of Speaker Heterogeneity

et al. 2019

View full text Add to dashboard Cite

show abstract

“…All of the features are generated from the time alignment of a dedicated ASR system. The timedelay layers stacked with bidirectional long short term memory layers (TDNN-BLSTM) are used as acoustic model of the ASR system and it is trained using multi-task learning strategy [15]. These ASR-generated features were shown to be effective to classify High-AQ speakers from Low-AQ ones in the aspect of acoustic impairment of PWA speech [14].…”

Section: Speaker-level Classification Accuracymentioning

confidence: 99%

An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

Qin

Lee

et al. 2020

J Sign Process Syst

Self Cite

View full text Add to dashboard Cite

Conventional automatic assessment of pathological speech usually follows two main steps: (1) extraction of pathology-specific features; (2) classification or regression on extracted features. Given the great variety of speech and language disorders, feature design is never a straightforward task, and yet it is most crucial to the performance of assessment. This paper presents an end-to-end approach to automatic speech assessment for Cantonese-speaking People With Aphasia (PWA). The assessment is formulated as a binary classification task to discriminate PWA with high scores of subjective assessment from those with low scores. The sequence-to-one Recurrent Neural Network with Gated Recurrent Unit (GRU-RNN) and Convolutional Neural Network (CNN) models are applied to realize the end-to-end mapping from fundamental speech features to the classification result. The pathology-specific features used for assessment can be learned implicitly by the neural network model. Class Activation Mapping (CAM) method is utilized to visualize how those features contribute to the assessment result. Our experimental results 2 Ying Qin et al.show that the end-to-end approach outperforms the conventional two-step approach in the classification task, and confirm that the CNN model is able to learn impairment-related features that are similar to human-designed features. The experimental results also suggest that CNN model performs better than sequence-to-one GRU-RNN model in this specific task.

show abstract

“…The development of ASR system on impaired speech follows multi-task learning approach in our previous work [9]. Timedelay neural network combined with bi-directional long shortterm memory layers (TDNN-BLSTM) are shared by three phone-level acoustic modeling tasks.…”

Section: Asr Systemmentioning

confidence: 99%

“…A context-dependent GMM-HMM (CD-GMM-HMM) for each task is trained beforehand to generate state-level tri-phone alignments. Refer to [9] for the detailed information of training corpora and CD-GMM-HMM training.…”

Section: Asr Systemmentioning

confidence: 99%

“…A combined set of clinically-relevant acoustic and text features derived from ASR outputs were utilized to predict subjective assessment scores. In our previous work, syllable-level embedding features, phone posteriorgrams, and supra-segmental duration features, were shown to be effective in assessment of language impairment with Cantonese-speaking PWA [8,9,10].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automatic Assessment of Language Impairment Based on Raw ASR Output

2019

Self Cite

View full text Add to dashboard Cite

For automatic assessment of language impairment in natural speech, properly designed text-based features are needed. The feature design relies on experts' domain knowledge and the feature extraction process may undesirably involve manual effort on transcribing. This paper describes a novel approach to automatic assessment of language impairment in narrative speech of people with aphasia (PWA), without explicit knowledge-driven feature design. A convolutional neural network (CNN) is used to extract language impairment related text features from the output of an automatic speech recognition (ASR) system or, if available, the manual transcription of input speech. To mitigate the adverse effect of ASR errors, confusion network is adopted to improve the robustness of embedding representation of ASR output. The proposed approach is evaluated on the task of discriminating severe PWA from mild PWA based on Cantonese narrative speech. Experimental results confirm the effectiveness of automatically learned text features. It is also shown that CNN models trained with text input and acoustic features are complementary to each other.

show abstract

Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning

Cited by 24 publications

References 24 publications

Automatic Assessment of Prosodic Quality in Down Syndrome: Analysis of the Impact of Speaker Heterogeneity

Automatic Assessment of Prosodic Quality in Down Syndrome: Analysis of the Impact of Speaker Heterogeneity

An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

Automatic Assessment of Language Impairment Based on Raw ASR Output

Contact Info

Product

Resources

About