Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions

Ramanarayanan, Vikram; Lange, Patrick; Evanini, Keelan; Molloy, Hillary; Suendermann-Oeft, David

doi:10.21437/interspeech.2017-1213

Cited by 16 publications

(16 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The low difference between the performance on training and corresponding test sets indicate that the models do not overfit the data. More importantly, the values of the achieved correlation coefficients resemble those reported in [13], related to human rater correlation, on a conversational task which is, in terms of difficulty for L2 learners, similar to some of the tasks analyzed in this paper.…”

Section: Classification Results and Conclusionsupporting

confidence: 78%

“…Since every utterance was scored by only one expert, it was not possible to evaluate any kind of agreement among experts. However, according to [13] and [14], inter-rater human correlation varies between around 0.6 and 0.9, depending on the type of proficiency test. In this work, correlation between an automatic rater and an expert one is between 0.53 and 0.61, indicating a good performance of the proposed system.…”

Section: Evaluation Campaigns On Trilinguismmentioning

confidence: 99%

See 1 more Smart Citation

Automatic Assessment of Spoken Language Proficiency of Non-native Children

Gretter

Matassoni

Allgaier

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

This paper describes technology developed to automatically grade Italian students (ages 9-16) on their English and German spoken language proficiency. The students' spoken answers are first transcribed by an automatic speech recognition (ASR) system and then scored using a feedforward neural network (NN) that processes features extracted from the automatic transcriptions. In-domain acoustic models, employing deep neural networks (DNNs), are derived by adapting the parameters of an original out of domain DNN. Automatic scores are computed for low level proficiency indicators -such as: lexical richness, syntax correctness, quality of pronunciation, discourse fluency, semantic relevance to the prompt, etc -defined by human experts in language proficiency. A set of experiments was carried out on a large set of data collected during proficiency evaluation campaigns involving thousands of students, manually scored by human experts. Obtained results are presented and discussed.

show abstract

Section: Classification Results and Conclusionsupporting

confidence: 78%

Section: Evaluation Campaigns On Trilinguismmentioning

confidence: 99%

Automatic Assessment of Spoken Language Proficiency of Non-native Children

Gretter

Matassoni

Allgaier

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Incorporated different types of models and tested them. Ramanarayanan et al (2017) worked on feature extraction methods and extracted punctuation, fluency, and stress and trained different Machine Learning models for scoring. Knill et al (2018).…”

Section: Speech Response Scoringmentioning

confidence: 99%

An automated essay scoring systems: a systematic literature review

2021

View full text Add to dashboard Cite

Assessment in the Education system plays a significant role in judging student performance. The present evaluation system is through human assessment. As the number of teachers' student ratio is gradually increasing, the manual evaluation process becomes complicated. The drawback of manual evaluation is that it is time-consuming, lacks reliability, and many more. This connection online examination system evolved as an alternative tool for pen and paper-based methods. Present Computer-based evaluation system works only for multiple-choice questions, but there is no proper evaluation system for grading essays and short answers. Many researchers are working on automated essay grading and short answer scoring for the last few decades, but assessing an essay by considering all parameters like the relevance of the content to the prompt, development of ideas, Cohesion, and Coherence is a big challenge till now. Few researchers focused on Content-based evaluation, while many of them addressed style-based assessment. This paper provides a systematic literature review on automated essay scoring systems. We studied the Artificial Intelligence and Machine Learning techniques used to evaluate automatic essay scoring and analyzed the limitations of the current studies and research trends. We observed that the essay evaluation is not done based on the relevance of the content and coherence.

show abstract

“…Automated scoring of multiple aspects of conversational proficiency is one way to address this need. While the automated scoring of text and speech data has been a wellexplored topic for several years, particularly for essays and short constructed responses in the case of the former (Shermis and Burstein, 2013;Burrows et al, 2015;Madnani et al, 2017) and monolog speech for the latter (Neumeyer et al, 2000;Witt and Young, 2000;Xi et al, 2012;Bhat and Yoon, 2015)), research on the interpretable automated scoring of dialog has only recently started gaining traction (Evanini et al, 2015;Litman et al, 2016;Ramanarayanan et al, 2017). Further, certain dialog constructs such as those pertaining to interaction -engagement, turn-taking and repairare a lot less well-studied as compared to others like delivery and language use.…”

Section: Automated Scoring Of Text Dialogmentioning

confidence: 99%

Exploring Recurrent, Memory and Attention Based Architectures for Scoring Interactional Aspects of Human-Machine Text Dialog

Ramanarayanan,

Mulholland,

Ghosh

2020

Preprint

Self Cite

View full text Add to dashboard Cite

An important step towards enabling English language learners to improve their conversational speaking proficiency involves automated scoring of multiple aspects of interactional competence and subsequent targeted feedback. This paper builds on previous work in this direction to investigate multiple neural architectures -recurrent, attention and memory based -along with feature-engineered models for the automated scoring of interactional and topic development aspects of text dialog data. We conducted experiments on a conversational database of text dialogs from human learners interacting with a cloud-based dialog system, which were triple-scored along multiple dimensions of conversational proficiency. We find that fusion of multiple architectures performs competently on our automated scoring task relative to expert interrater agreements, with (i) hand-engineered features passed to a support vector learner and (ii) transformer-based architectures contributing most prominently to the fusion.

show abstract

Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions

Cited by 16 publications

References 19 publications

Automatic Assessment of Spoken Language Proficiency of Non-native Children

Automatic Assessment of Spoken Language Proficiency of Non-native Children

An automated essay scoring systems: a systematic literature review

Exploring Recurrent, Memory and Attention Based Architectures for Scoring Interactional Aspects of Human-Machine Text Dialog

Contact Info

Product

Resources

About