Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-3008
|View full text |Cite
|
Sign up to set email alerts
|

Atypical Inputs in Educational Applications

Abstract: In large-scale educational assessments, the use of automated scoring has recently become quite common. While the majority of student responses can be processed and scored without difficulty, there are a small number of responses that have atypical characteristics that make it difficult for an automated scoring system to assign a correct score. We describe a pipeline that detects and processes these kinds of responses at run-time. We present the most frequent kinds of what are called non-scorable responses alon… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
7
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 13 publications
0
7
0
Order By: Relevance
“…Features related to language use covered vocabulary, grammar and some aspects of discourse structure. An additional module was used to flag atypical responses where an automated score is likely to be unreliable [11,15]. See [12] for a detailed description of the features and the filtering module.…”
Section: Automated Scoring Enginementioning
confidence: 99%
“…Features related to language use covered vocabulary, grammar and some aspects of discourse structure. An additional module was used to flag atypical responses where an automated score is likely to be unreliable [11,15]. See [12] for a detailed description of the features and the filtering module.…”
Section: Automated Scoring Enginementioning
confidence: 99%
“…1) Several research studies have shown that essay scoring models are overstable (Yoon et al, 2018;Powers et al, 2002;Feng et al, 2018). Even large changes in essay content do not lead to significant change in scores.…”
Section: Introductionmentioning
confidence: 99%
“…Motivated by the previous studies on testing automatic scoring systems [29,20,22], which show that AES models are vulnerable to atypical inputs, our aim is to gain some intuitions behind how models score a human written sample. For instance, these studies show that automatic scoring systems score high on construct-irrelevant inputs like speeches and false facts [22], gibberish text [17], repeated paragraphs and canned responses [20], etc but do not show why do the models award high scores in these cases.…”
Section: Introductionmentioning
confidence: 99%