Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

Mitra, Vikramjit; Huang, Zifang; Lea, C. H.; Tooley, Lauren; Wu, Sarah; Botten, Darren; Palekar, Ashwini; Thelapurath, Shrinath; Georgiou, Panayiotis G.; Kajarekar, Sachin; Bigham, Jefferey

doi:10.48550/arxiv.2106.11759

Cited by 2 publications

(8 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Work conducted after the closing date of this systematic review suggests that deep learning techniques may further improve automatic classification of stuttering [23] [64]. Further work is needed to confirm whether modern deep methods can allow advances in performance as seen in other fields such as computer vision [65].…”

Section: B Recommendationsmentioning

confidence: 95%

Systematic Review of Machine Learning Approaches for Detecting Developmental Stuttering

Barrett

Howell

2022

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

A systematic review of the literature on statistical and machine learning schemes for identifying symptoms of developmental stuttering from audio recordings is reported. Twenty-seven papers met the quality standards that were set. Comparison of results across studies was not possible because training and testing data, model architecture and feature inputs varied across studies. The limitations that were identified for comparison across studies included: no indication of application for the work, data were selected for training and testing models in ways that could lead to biases, studies used different datasets and attempted to locate different symptom types, feature inputs were reported in different ways and there was no standard way of reporting performance statistics. Recommendations were made about how these problems can be addressed in future work on this topic.

show abstract

Section: B Recommendationsmentioning

confidence: 95%

Systematic Review of Machine Learning Approaches for Detecting Developmental Stuttering

Barrett

Howell

2022

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…2.1.3 Dysfluent Speech Recognition. Technical work on improving speech assistants for PWS has focused on ASR models [8,23,31,35,50,51,61], stuttering detection [43], dysfluency detection or classification [22,40,42,48,56], clinical assessment [11], and dataset development [12,37,42,55]. Shonibare et al [61] and Mendelev et al [50] investigate training end-to-end RNN-T ASR models on speech from PWS.…”

Section: Overview Of Speech Recognition Systemsmentioning

confidence: 99%

“…Alharbi et al [2,3] focused on stuttered speech from kids that incorporates the structure of repetitions and other dyfluencies into an augmented language model that is better at including dysfluencies in a transcription. In our work, we focus on solutions, like Mitra et al's [51], which can be applied on top of existing recognition systems and do not require as much data as end-to-end solutions. Their VA-oriented approach was to optimize a small set of ASR decoder parameters on stuttered speech, such that the system is biased towards common VA phrases, and was effective in removing dysfluencies such as repetitions in speech.…”

Section: Overview Of Speech Recognition Systemsmentioning

confidence: 99%

“…Research on speech technology for PWS has largely focused on technical improvements to automatic speech recognition (ASR) models [31,35,50,51,61], dysfluency detection [22,40,42,48], and dataset development [12,37,42,55]. This body of work has largely lacked a human-centered approach to understanding the experiences that PWS have with speech recognition systems [17], which could in turn inform how to prioritize and advance technical improvements.…”

Section: Introductionmentioning

confidence: 99%

“…However, people with more moderate or severe patterns encountered high truncation (e.g., > 20%) and word error rates (13.6% to 49.2%)-which is also reflected by many survey participants. Motivated by these results, we describe and evaluate three technical solutions (two new and one previously published [51]) that apply production-oriented improvements to a consumer-grade ASR system. These solutions reduce truncation rates by 79.1% and improve word error rates in transcribed speech from 25.4% to 9.9% for a set of participants with moderate to severe dysfluent speech.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

From User Perceptions to Technical Improvement: Enabling People Who Stutter to Better Use Speech Recognition

Lea¹,

Huang²,

Tooley³

et al. 2023

Preprint

View full text Add to dashboard Cite

Consumer speech recognition systems do not work as well for many people with speech differences, such as stuttering, relative to the rest of the general population. However, what is not clear is the degree to which these systems do not work, how they can be improved, or how much people want to use them. In this paper, we first address these questions using results from a 61-person survey from people who stutter and find participants want to use speech recognition but are frequently cut off, misunderstood, or speech predictions do not represent intent. In a second study, where 91 people who stutter recorded voice assistant commands and dictation, we quantify how dysfluencies impede performance in a consumer-grade speech recognition system. Through three technical investigations, we demonstrate how many common errors can be prevented, resulting in a system that cuts utterances off 79.1% less often and improves word error rate from 25.4% to 9.9%. CCS CONCEPTS• Human-centered computing → Empirical studies in accessibility.

show abstract

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

Cited by 2 publications

References 0 publications

Systematic Review of Machine Learning Approaches for Detecting Developmental Stuttering

Systematic Review of Machine Learning Approaches for Detecting Developmental Stuttering

From User Perceptions to Technical Improvement: Enabling People Who Stutter to Better Use Speech Recognition

Contact Info

Product

Resources

About