Speech Recognition and Correction of a Stuttered Speech

Dash, Ankit; Subramani, Nikhil; Manjunath, T N; Yaragarala, Vishruti; Tripathi, Shikha

doi:10.1109/icacci.2018.8554455

Cited by 15 publications

(13 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Regarding the feasibility of automatic extraction, for the pacing of syllable pronounciation, because of MONAH's reliance on Google Speech to text, the returned timings were at word-level instead of the required syllable-level. Commercial systems typically return timings at word-level or sentence-level and it would take a specialized speech recognition system to return syllable-level information [50]. As for the volume and pitch, granular timestamped information could be readily extracted through open-source packages like OpenSmile [51].…”

Section: Discussion Of Results From Aspects Non-verbal Annotationsmentioning

confidence: 99%

An empirical user-study of text-based nonverbal annotation systems for human-human conversations

Kim¹,

Yacef²

2021

Preprint

View full text Add to dashboard Cite

With the substantial increase in the number of online human-human conversations and the usefulness of multimodal transcripts, there is a rising need for automated multimodal transcription systems to help us better understand the conversations. In this paper, we evaluated three methods to perform multimodal transcription. They were (1)Jefferson -an existing manual system used widely by the linguistics community, (2) MONAH -a system that aimed to make multimodal transcripts accessible and automated, (3) MONAH+ -a system that builds on MONAH that visualizes machine attention. Based on 104 participants responses, we found that (1) all text-based methods significantly reduced the amount of information for the human users, (2) MONAH was found to be more usable than Jefferson, (3) Jefferson's relative strength was in chronemics (pace / delay) and paralinguistics (pitch / volume) annotations, whilst MONAH's relative strength was in kinesics (body language) annotations, (4) enlarging words' font-size based on machine attention was confusing human users as loudness. These results pose considerations for researchers designing a multimodal annotation system for the masses who would like a fully-automated or human-augmented conversational analysis system.

show abstract

Section: Discussion Of Results From Aspects Non-verbal Annotationsmentioning

confidence: 99%

An empirical user-study of text-based nonverbal annotation systems for human-human conversations

Kim¹,

Yacef²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Some of the bottlenecks of the above papers are speech recognition systems that are confined to regional languages and some of the papers fails to discuss about correction of stuttered speech. The Amplitude thresholding is done using neural networks [9] but the process is complex. Some of these issues are addressed in this paper which discusses about different methods for removal of prolongations and string repetitions.…”

Section: Literature Reviewmentioning

confidence: 99%

Speech Based Anti Stuttering Algorithm using Matlab

Prabhu,

Kumar,

kumar

et al. 2020

IJEAT

View full text Add to dashboard Cite

Stuttering is one of the most occurring speech disorders. Usual onset of stuttering is from 2 to 5 years which can be cured in childhood but for 20% of cases it prolongs to adulthood. Its severity increases when a stuttered person faces public and when he/she gets anxious. It is a long-term disorder which cannot be permanently cured. This drawback will be externally overcome with the help of designed Speech based algorithm. This algorithm is implemented in five stages namely magnitude filtering, silence ejection, speech to text, repetition removal and text to speech (TTS). The project aims at removing the repetition of words. This can be used in speech recognition systems. This system helps in encouraging the person who suffers from stuttering to give an open talk in public.

show abstract

“…The focus of this paper is on detection of five stuttering event types: Blocks, Prolongations, Sound Repetitions, Word/Phrase Repetitions, and Interjections. Existing work has explored this problem using traditional signal processing techniques [15,16,17], language modeling (LM) [12,18,19,20,21], and acoustic modeling (AM) [21,10]. Each approach has be shown to be effective 1.…”

Section: Introductionmentioning

confidence: 99%

SEP-28k: A Dataset for Stuttering Event Detection from Podcasts with People Who Stutter

Mitra

Joshi

Kajarekar

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The ability to automatically detect stuttering events in speech could help speech pathologists track an individual's fluency over time or help improve speech recognition systems for people with atypical speech patterns. Despite increasing interest in this area, existing public datasets are too small to build generalizable dysfluency detection systems and lack sufficient annotations. In this work, we introduce Stuttering Events in Podcasts (SEP-28k), a dataset containing over 28k clips labeled with five event types including blocks, prolongations, sound repetitions, word repetitions, and interjections. Audio comes from public podcasts largely consisting of people who stutter interviewing other people who stutter. We benchmark a set of acoustic models on SEP-28k and the public FluencyBank dataset and highlight how simply increasing the amount of training data improves relative detection performance by 28% and 24% F1 on each. Annotations from over 32k clips across both datasets will be publicly released.

show abstract

Speech Recognition and Correction of a Stuttered Speech

Cited by 15 publications

References 3 publications

An empirical user-study of text-based nonverbal annotation systems for human-human conversations

An empirical user-study of text-based nonverbal annotation systems for human-human conversations

Speech Based Anti Stuttering Algorithm using Matlab

SEP-28k: A Dataset for Stuttering Event Detection from Podcasts with People Who Stutter

Contact Info

Product

Resources

About