2020
DOI: 10.48550/arxiv.2012.06259
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improved Robustness to Disfluencies in RNN-Transducer Based Speech Recognition

Abstract: Automatic Speech Recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies with a focus on partial words. For evaluation we use clean data, data with disfluencies and a separate dataset with speech affected by stuttering. We show that after including a small amount of data with disfluencies in the training set the recognition accurac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 10 publications
0
4
0
Order By: Relevance
“…2.1.3 Dysfluent Speech Recognition. Technical work on improving speech assistants for PWS has focused on ASR models [8,23,31,35,50,51,61], stuttering detection [43], dysfluency detection or classification [22,40,42,48,56], clinical assessment [11], and dataset development [12,37,42,55]. Shonibare et al [61] and Mendelev et al [50] investigate training end-to-end RNN-T ASR models on speech from PWS.…”
Section: Overview Of Speech Recognition Systemsmentioning
confidence: 99%
See 3 more Smart Citations
“…2.1.3 Dysfluent Speech Recognition. Technical work on improving speech assistants for PWS has focused on ASR models [8,23,31,35,50,51,61], stuttering detection [43], dysfluency detection or classification [22,40,42,48,56], clinical assessment [11], and dataset development [12,37,42,55]. Shonibare et al [61] and Mendelev et al [50] investigate training end-to-end RNN-T ASR models on speech from PWS.…”
Section: Overview Of Speech Recognition Systemsmentioning
confidence: 99%
“…Technical work on improving speech assistants for PWS has focused on ASR models [8,23,31,35,50,51,61], stuttering detection [43], dysfluency detection or classification [22,40,42,48,56], clinical assessment [11], and dataset development [12,37,42,55]. Shonibare et al [61] and Mendelev et al [50] investigate training end-to-end RNN-T ASR models on speech from PWS. Shonibare et al introduces a detect-then-pass approach that incorporates a dysfluency detector where audio frames with dysfluencies are ignored entirely by the RNN-T decoder.…”
Section: Overview Of Speech Recognition Systemsmentioning
confidence: 99%
See 2 more Smart Citations