Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-896
|View full text |Cite
|
Sign up to set email alerts
|

Using Fluency Representation Learned from Sequential Raw Features for Improving Non-native Fluency Scoring

Abstract: Speech fluency/disfluency can be evaluated by analyzing a range of phonetic and prosodic features. Deep neural networks are commonly trained to map fluency-related features into the human scores. However, the effectiveness of deep learning-based models is constrained by the limited amount of labeled training samples. To address this, we introduce a self-supervised learning (SSL) approach that takes into account phonetic and prosody awareness for fluency scoring. Specifically, we first pre-train the model using… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 27 publications
(57 reference statements)
0
5
0
Order By: Relevance
“…To verify the performance of the proposed approach in the "read aloud" scenario, we further conduct experiments on two more datasets, ByteRead and Speechocean762. ByteRead is another internal dataset collected under the "read aloud" scenario and a detailed description is given in [16]. Speechocean762 is an open-sourced speech assessment corpus with 5,000 utterances collected from 250 speakers [27].…”
Section: Experimental Results In the "Read Aloud" Scenariomentioning
confidence: 99%
See 4 more Smart Citations
“…To verify the performance of the proposed approach in the "read aloud" scenario, we further conduct experiments on two more datasets, ByteRead and Speechocean762. ByteRead is another internal dataset collected under the "read aloud" scenario and a detailed description is given in [16]. Speechocean762 is an open-sourced speech assessment corpus with 5,000 utterances collected from 250 speakers [27].…”
Section: Experimental Results In the "Read Aloud" Scenariomentioning
confidence: 99%
“…In [16], an ASR-based fluency scoring system was reported with encouraging results in the "read aloud" scenario. To extend the system to the "open response" scenario, an additional step of speech-to-text conversion can be adopted.…”
Section: Asr-based Fluency Scoring Systemmentioning
confidence: 99%
See 3 more Smart Citations