Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1630
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning

Abstract: This paper describes an investigation on automatic speech assessment for people with aphasia (PWA) using a DNN based automatic speech recognition (ASR) system. The main problems being addressed are the lack of training speech in the intended application domain and the relevant degradation of ASR performance for impaired speech of PWA. We adopt the TDNN-BLSTM structure for acoustic modeling and apply the technique of multi-task learning with large amount of domainmismatched data. This leads to a significant imp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
17
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 24 publications
(17 citation statements)
references
References 24 publications
0
17
0
Order By: Relevance
“…Automatic assessment of pathological speech has also been researched, but, in general, the studies on the topic are related to specific aspects and populations. Some works focus on the speech intelligibility of people with aphasia [23,24] or speech intelligibility in pathological voices [25,26]. Others try to identify speech disorders in children with cleft lip and palate [27] or to predict automatically some dysarthric speech evaluation metrics, such as intelligibility, severity and articulation impairment [28,29].…”
Section: Introductionmentioning
confidence: 99%
“…Automatic assessment of pathological speech has also been researched, but, in general, the studies on the topic are related to specific aspects and populations. Some works focus on the speech intelligibility of people with aphasia [23,24] or speech intelligibility in pathological voices [25,26]. Others try to identify speech disorders in children with cleft lip and palate [27] or to predict automatically some dysarthric speech evaluation metrics, such as intelligibility, severity and articulation impairment [28,29].…”
Section: Introductionmentioning
confidence: 99%
“…All of the features are generated from the time alignment of a dedicated ASR system. The timedelay layers stacked with bidirectional long short term memory layers (TDNN-BLSTM) are used as acoustic model of the ASR system and it is trained using multi-task learning strategy [15]. These ASR-generated features were shown to be effective to classify High-AQ speakers from Low-AQ ones in the aspect of acoustic impairment of PWA speech [14].…”
Section: Speaker-level Classification Accuracymentioning
confidence: 99%
“…The development of ASR system on impaired speech follows multi-task learning approach in our previous work [9]. Timedelay neural network combined with bi-directional long shortterm memory layers (TDNN-BLSTM) are shared by three phone-level acoustic modeling tasks.…”
Section: Asr Systemmentioning
confidence: 99%
“…A context-dependent GMM-HMM (CD-GMM-HMM) for each task is trained beforehand to generate state-level tri-phone alignments. Refer to [9] for the detailed information of training corpora and CD-GMM-HMM training.…”
Section: Asr Systemmentioning
confidence: 99%
See 1 more Smart Citation