Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-2006
|View full text |Cite
|
Sign up to set email alerts
|

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

Abstract: Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work. Current speech recognition systems are trained primarily with data from fluent speakers and as a consequence do not generalize well to speech with dysfluencies such as sound or word repetitions, sound prolongations, or audible blocks. The focus of this work is on quantitative analysis of a consumer speech … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
11
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 14 publications
(11 citation statements)
references
References 13 publications
0
11
0
Order By: Relevance
“…Visualization: Figure. 4 shows the embeddings computed from the statistics pooling layer. In MTL scheme, it is evident from the well formed podcast clusters that the model is trying to learn podcast dependent stuttering information.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Visualization: Figure. 4 shows the embeddings computed from the statistics pooling layer. In MTL scheme, it is evident from the well formed podcast clusters that the model is trying to learn podcast dependent stuttering information.…”
Section: Resultsmentioning
confidence: 99%
“…However, this method of detecting stuttering is very arduous and time-intensive, and is also prejudiced towards the subjective belief of speech therapist. Besides, automatic speech recognition (ASR) tools also fail to recognize the stuttered speech [4], that makes it unrealistic to easily access virtual assistants like Alexa, Apple Siri, Cortana, etc. for PWS.…”
Section: Introductionmentioning
confidence: 99%
“…This convention of SD is very laborious and timeconsuming and is also inclined toward the idiosyncratic belief of speech therapists. In addition, the automatic speech recognition systems (ASR) are working fine for normal fluent speech, however, they are unsuccessful in recognizing the stuttered speech [9], which makes it impractical for PWS to easily access virtual assistants like Apple Siri, Alexa, etc. As a result, interactive automatic SD systems that provide an impartial objective, and consistent evaluation of stuttering speech are strongly encouraged.…”
Section: Introductionmentioning
confidence: 99%
“…Fig.3: A schematic diagram of Multi-contextual StutterNet, which is a multi-class classifier that exploits different variable contexts of(5,9) in SD. The FluentBranch and DisfluentBranch are composed of 3 fully connected layers followed by a softmax layer for prediction of different stuttering classes, CB: Context Block, SPL: Statistical Pooling Layer.…”
mentioning
confidence: 99%
“…(e.g. with voice assistants or speech dictation systems) [3], [25], [26], [31]. With these applications in mind, we present new techniques for automatic disfluency detection, categorization, and localization.…”
Section: Introductionmentioning
confidence: 99%