Nonverbal Sound Detection for Disordered Speech

Huang, Zifang; Jain, Dhruv; Tooley, Lauren; Liaghat, Zeinab; Thelapurath, Shrinath; Findlater, Leah; Bigham, Jeffrey P.

doi:10.1109/icassp43922.2022.9747227

Cited by 8 publications

(15 citation statements)

References 39 publications

(41 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lea et al focused on intended speech transcription. They explored how people who stutter, where stuttering is characterized by an increase in disfluencies, interact with voice assistants and dictation services [26]. These services rely on ASR and Lea et al find that, for these services, individuals preferred to only see their intended speech transcribed.…”

Section: B Asr Text and Disfluency Detectionmentioning

confidence: 99%

“…(e.g. with voice assistants or speech dictation systems) [3], [25], [26], [31]. With these applications in mind, we present new techniques for automatic disfluency detection, categorization, and localization.…”

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…Using language introduced in [26], we define two opposing ASR goals for disfluent speech: intended versus verbatim speech transcription. Lea et al focused on intended speech transcription, which removes disfluencies and only transcribes the speakers' intentions [26]. As an example, if a user articulated "and uh we were I was fortunate...," an intended speech transcript would drop disfluencies and only include "and I was fortunate..." Intended speech transcription is useful for applications when disfluencies impede understanding of the speaker's intent, but it does not allow for disfluency detection, categorization, or localization because the disfluencies are not tran-scribed.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Toward A Multimodal Approach for Disfluency Detection and Categorization

Romana

2023

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Speech disfluencies, such as filled pauses or repetitions, are disruptions in the typical flow of speech. Stuttering is a speech disorder characterized by a high rate of disfluencies, but all individuals speak with some disfluencies and the rates of disfluencies may by increased by factors such as cognitive load. Clinically, automatic disfluency detection may help in treatment planning for individuals who stutter. Outside of the clinic, automatic disfluency detection may serve as a pre-processing step to improve natural language understanding in downstream applications. With this wide range of applications in mind, we investigate language, acoustic, and multimodal methods for frame-level automatic disfluency detection and categorization. Each of these methods relies on audio as an input. First, we evaluate several automatic speech recognition (ASR) systems in terms of their ability to transcribe disfluencies, measured using disfluency error rates. We then use these ASR transcripts as input to a language-based disfluency detection model. We find that disfluency detection performance is largely limited by the quality of transcripts and alignments. We find that an acoustic-based approach that does not require transcription as an intermediate step outperforms the ASR language approach. Finally, we present multimodal architectures which we find improve disfluency detection performance over the unimodal approaches. Ultimately, this work introduces novel approaches for automatic frame-level disfluency and categorization. In the long term, this will help researchers incorporate automatic disfluency detection into a range of applications.

show abstract

Section: B Asr Text and Disfluency Detectionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Toward A Multimodal Approach for Disfluency Detection and Categorization

Romana

2023

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Recently, Jaddoh (2021) and Lea (2022) have studied the use of nonverbal sound as a method of instruction to extend the ability of interacting with ASR systems or devices. Lea used recordings with different accents to develop a model that detects different mouth sounds, such as "pop" and "click," as inputs, while Jaddoh suggested using nonverbal sound as a technique to control virtual home assistance.…”

Section: Table 1 Summary Of Speech Modalities Used In the Literaturementioning

confidence: 99%

Interaction between people with dysarthria and speech recognition systems: A review

Jaddoh

Loizides

Rana

2022

Assistive Technology

View full text Add to dashboard Cite

In recent years, rapid advancements have taken place for automatic speech recognition (ASR) systems and devices. Though ASR technologies have increased, the accessibility of these novel interaction systems is underreported and may present difficulties for people with speech impediments. In this article, we attempt to identify gaps in current research on the interaction between people with dysarthria and ASR systems and devices. We cover the period from 2011, when Siri (the first and the leading commercial voice assistant) was launched, to 2020. The review employs an interaction framework in which each element (user, input, system, and output) contributes to the interaction process. To select the articles for review, we conducted a search of scientific databases and academic journals. A total of 36 studies met the inclusion criteria, which included use of the word error rate (WER) as a measurement for evaluating ASR systems. This review determines that challenges in interacting with ASR systems persist even in light of the most recent commercial technologies. Further, understanding of the entire interaction process remains limited; thus, to improve this interaction, the recent progress of ASR systems must be elucidated.

show abstract