ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413618
|View full text |Cite
|
Sign up to set email alerts
|

Improved Robustness to Disfluencies in Rnn-Transducer Based Speech Recognition

Abstract: Automatic Speech Recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies with a focus on partial words. For evaluation we use clean data, data with disfluencies and a separate dataset with speech affected by stuttering. We show that after including a small amount of data with disfluencies in the training set the recognition accurac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(6 citation statements)
references
References 12 publications
1
4
0
Order By: Relevance
“…A recent preprint by Mendelev et al [17] is most similar to our work. They built an end-to-end speech recognition model using typical speech and speech with dysfluencies and show a 16% relative improvement in WER on some voice command tasks for users who stutter compared to a baseline without stuttered speech.…”
Section: Introductionsupporting
confidence: 90%
“…A recent preprint by Mendelev et al [17] is most similar to our work. They built an end-to-end speech recognition model using typical speech and speech with dysfluencies and show a 16% relative improvement in WER on some voice command tasks for users who stutter compared to a baseline without stuttered speech.…”
Section: Introductionsupporting
confidence: 90%
“…al. [9] studied the robustness of RNN-T based ASR models on disfluent speech that contained organic disfluencies like partial words using filters on utterance transcriptions that are indicative of hesitations and repetitions. We introduce the term organic disfluency to distinguish the speech containing hesitations and repetitions from people who do not self identify as People Who Stutter or have Stutter is observed in 5% to 10% of children's speech , who are aged between 2 and 6 years [11].…”
Section: Related Workmentioning
confidence: 99%
“…The research team initially collected data on speech dysfluencies and subsequently utilized this data to retrain their algorithms. By increasing the amount of training data with dysfluencies, they successfully improved the accuracy of their algorithms [30].…”
Section: Automatic Speech Recognition For Users With Diverse Needsmentioning
confidence: 99%
“…Only by better characterizing the causes and types of errors in the recognition algorithms can it be improved to meet the needs of all user groups [1]. Hence, to facilitate the characterization of speech errors, we analyzed the substitution, deletion, and insertion of speech recognition when users with Down Syndrome interact with speech algorithms, following the approach from related work targeting speech differences [29][30][31].…”
Section: Introductionmentioning
confidence: 99%