Proceedings of the Thirteenth Workshop on Innovative Use of NLP For Building Educational Applications 2018
DOI: 10.18653/v1/w18-0501
|View full text |Cite
|
Sign up to set email alerts
|

Using exemplar responses for training and evaluating automated speech scoring systems

Abstract: Automated scoring engines are usually trained and evaluated against human scores and compared to the benchmark of human-human agreement. In this paper we compare the performance of an automated speech scoring engine using two corpora: a corpus of almost 700,000 randomly sampled spoken responses with scores assigned by one or two raters during operational scoring, and a corpus of 16,500 exemplar responses with scores reviewed by multiple expert raters. We show that the choice of corpus used for model evaluation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 24 publications
0
5
0
Order By: Relevance
“…Rubric-based writing has drawbacks like rigid formulation of tasks (Warner, 2018), and many applications of rubrics are rooted in a racialized history difficult for technology to escape (Dixon-Rom谩n et al, 2019). Bias creeps into rubric writing and scoring of training data, unless extensive countermeasures are taken to maintain reliability across student backgrounds and varied response types (Loukina et al, 2018;West-Smith et al, 2018). It also limits flexibility in task choice and response type from students, limiting students to writing styles that mirror the norms of the dominant school culture.…”
Section: Case Study: Automated Writing Feedback and Scoringmentioning
confidence: 99%
“…Rubric-based writing has drawbacks like rigid formulation of tasks (Warner, 2018), and many applications of rubrics are rooted in a racialized history difficult for technology to escape (Dixon-Rom谩n et al, 2019). Bias creeps into rubric writing and scoring of training data, unless extensive countermeasures are taken to maintain reliability across student backgrounds and varied response types (Loukina et al, 2018;West-Smith et al, 2018). It also limits flexibility in task choice and response type from students, limiting students to writing styles that mirror the norms of the dominant school culture.…”
Section: Case Study: Automated Writing Feedback and Scoringmentioning
confidence: 99%
“…Second, while there is a strong alignment between total and indicator human scores (饾憻 > 0.8), there is only moderate inter-rater agreement on the total and indicator scores (饾憻 = 0.4-0.5). We plan to both improve the rubric scoring and to collect more human ratings per transcript to allow distillation of the average human judgment as well as identification of cases that are more or less controversial for human raters; [25] we argue that evaluation on uncontroversial (exemplar) cases provides important information for understanding system performance. Previous research provides evidence that machine learning models are sometimes able to ignore biases and idiosyncrasies of specific human raters and agree with humans better than humans agree with each other.…”
Section: Discussionmentioning
confidence: 99%
“…Others Others This paper presents the NLI-PT, the first Portuguese dataset compiled for native language identification (NLI), the task of identifying an author's first language based on their second language writing. (Loukina, Zechner, Bruno, & Beigman Klebanov, 2018) Question Classification Evaluation This paper compares the performance of an automated speech scoring engine using two corpora. (Rudzewitz et al, 2018) Question Classification Feedback This paper presents a novel approach leveraging task information to generate the expected range of well-formed and ill-formed variability in learner answers along with the required diagnosis and feedback.…”
Section: Othersmentioning
confidence: 99%