Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-1213
|View full text |Cite
|
Sign up to set email alerts
|

Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions

Abstract: We present a spoken dialog-based framework for the computerassisted language learning (CALL) of conversational English. In particular, we leveraged the open-source HALEF dialog framework to develop a job interview conversational application. We then used crowdsourcing to collect multiple interactions with the system from non-native English speakers. We analyzed human-rated scores of the recorded dialog data on three different scoring dimensions critical to the delivery of conversational English-fluency, pronun… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
7
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 16 publications
(16 citation statements)
references
References 19 publications
1
7
0
Order By: Relevance
“…The low difference between the performance on training and corresponding test sets indicate that the models do not overfit the data. More importantly, the values of the achieved correlation coefficients resemble those reported in [13], related to human rater correlation, on a conversational task which is, in terms of difficulty for L2 learners, similar to some of the tasks analyzed in this paper.…”
Section: Classification Results and Conclusionsupporting
confidence: 78%
See 1 more Smart Citation
“…The low difference between the performance on training and corresponding test sets indicate that the models do not overfit the data. More importantly, the values of the achieved correlation coefficients resemble those reported in [13], related to human rater correlation, on a conversational task which is, in terms of difficulty for L2 learners, similar to some of the tasks analyzed in this paper.…”
Section: Classification Results and Conclusionsupporting
confidence: 78%
“…Since every utterance was scored by only one expert, it was not possible to evaluate any kind of agreement among experts. However, according to [13] and [14], inter-rater human correlation varies between around 0.6 and 0.9, depending on the type of proficiency test. In this work, correlation between an automatic rater and an expert one is between 0.53 and 0.61, indicating a good performance of the proposed system.…”
Section: Evaluation Campaigns On Trilinguismmentioning
confidence: 99%
“…Incorporated different types of models and tested them. Ramanarayanan et al (2017) worked on feature extraction methods and extracted punctuation, fluency, and stress and trained different Machine Learning models for scoring. Knill et al (2018).…”
Section: Speech Response Scoringmentioning
confidence: 99%
“…Automated scoring of multiple aspects of conversational proficiency is one way to address this need. While the automated scoring of text and speech data has been a wellexplored topic for several years, particularly for essays and short constructed responses in the case of the former (Shermis and Burstein, 2013;Burrows et al, 2015;Madnani et al, 2017) and monolog speech for the latter (Neumeyer et al, 2000;Witt and Young, 2000;Xi et al, 2012;Bhat and Yoon, 2015)), research on the interpretable automated scoring of dialog has only recently started gaining traction (Evanini et al, 2015;Litman et al, 2016;Ramanarayanan et al, 2017). Further, certain dialog constructs such as those pertaining to interaction -engagement, turn-taking and repairare a lot less well-studied as compared to others like delivery and language use.…”
Section: Automated Scoring Of Text Dialogmentioning
confidence: 99%