Automatic Fluency Assessment Method for Spontaneous Speech without Reference Text

Liu, Jiajun; Wumaier, Aishan; Fan, Cong; Guo, Shen

doi:10.3390/electronics12081775

Cited by 2 publications

(1 citation statement)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…PSC-PS-DF consists of propositional speaking files for the Mandarin Proficiency Test (Putonghua Shuiping Ceshi, PSC). Previous work has shown that, due to the nature of Chinese propositional speaking, which requires the speaker to freely describe a topic for three minutes without any reference text, the speech files contain a large number of disfluent features, such as "um", "ah", and "uh" interjections, blocks, prolongations, and repetition, but such disfluent features are rarely marked and used in research [55]. In this study, disfluent features were annotated in Chinese propositional speaking data to obtain spontaneously spoken disfluent features in Chinese for the detection of disfluent Chinese-language speech.…”

Section: Psc-ps-dfmentioning

confidence: 99%

Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths

et al. 2023

Self Cite

View full text Add to dashboard Cite

Speech is critical for interpersonal communication, but not everyone has fluent communication skills. Speech disfluency, including stuttering and interruptions, affects not only emotional expression but also clarity of expression for people who stutter. Existing methods for detecting speech disfluency rely heavily on annotated data, which can be costly. Additionally, these methods have not considered the issue of variable-length disfluent speech, which limits the scalability of detection methods. To address these limitations, this paper proposes an automated method for detecting speech disfluency that can improve communication skills for individuals and assist therapists in tracking the progress of stuttering patients. The proposed method focuses on detecting four types of disfluency features using single-task detection and utilizes embeddings from the pre-trained wav2vec2.0 model, as well as convolutional neural network (CNN) and Transformer models for feature extraction. The model’s scalability is improved by considering the issue of variable-length disfluent speech and modifying the model based on the entropy invariance of attention mechanisms. The proposed automated method for detecting speech disfluency has the potential to assist individuals in overcoming speech disfluency, improve their communication skills, and aid therapists in tracking the progress of stuttering patients. Additionally, the model’s scalability across languages and lengths enhances its practical applicability. The experiments demonstrate that the model outperforms baseline models in both English and Chinese datasets, proving its universality and scalability in real-world applications.

show abstract

Section: Psc-ps-dfmentioning

confidence: 99%