Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-680
|View full text |Cite
|
Sign up to set email alerts
|

Confidence Score Based Conformer Speaker Adaptation for Speech Recognition

Abstract: A key challenge in dysarthric speech recognition is the speaker-level diversity attributed to both speaker-identity associated factors such as gender, and speech impairment severity. Most prior researches on addressing this issue focused on using speaker-identity only. To this end, this paper proposes a novel set of techniques to use both severity and speaker-identity in dysarthric speech recognition: a) multitask training incorporating severity prediction error; b) speaker-severity aware auxiliary feature ada… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(1 citation statement)
references
References 37 publications
0
1
0
Order By: Relevance
“…T HE performance of automatic speech recognition (ASR) systems has been significantly improved over the past decades with the wide application of deep learning techniques [1]- [15]. More recently there has been a major trend in the speech technology field transiting from hybrid ASR systems [1]- [3] to end-to-end (E2E) modelling [4]- [8], [10], [11] which utilizes a single neural network to directly map acoustic feature vectors to the surface word or token sequences.…”
Section: Introductionmentioning
confidence: 99%
“…T HE performance of automatic speech recognition (ASR) systems has been significantly improved over the past decades with the wide application of deep learning techniques [1]- [15]. More recently there has been a major trend in the speech technology field transiting from hybrid ASR systems [1]- [3] to end-to-end (E2E) modelling [4]- [8], [10], [11] which utilizes a single neural network to directly map acoustic feature vectors to the surface word or token sequences.…”
Section: Introductionmentioning
confidence: 99%