2016
DOI: 10.1002/tea.21299
|View full text |Cite
|
Sign up to set email alerts
|

Validation of automated scoring of science assessments

Abstract: Constructed response items can both measure the coherence of student ideas and serve as reflective experiences to strengthen instruction. We report on new automated scoring technologies that can reduce the cost and complexity of scoring constructed-response items. This study explored the accuracy of crater-ML, an automated scoring engine developed by Educational Testing Service, for scoring eight science inquiry items that require students to use evidence to explain complex phenomena. Automated scoring showed … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
74
0
4

Year Published

2016
2016
2022
2022

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 99 publications
(81 citation statements)
references
References 33 publications
3
74
0
4
Order By: Relevance
“…Sao Pedro, Baker, Gobert, Montalvo, and Nakama () described a set of “detectors” that could fairly reliably detect whether a student was performing controlled or uncontrolled experiments, whether the student's experiment was testing a hypothesis or not, and whether students were planning their behaviour (see also Gobert, Kim, Sao Pedro, Kennedy, & Betts, ; Gobert, Sao Pedro, Raziuddin, & Baker, ). Liu, Rios, Heilman, Gerard, and Linn () assessed the validity of a tool called c‐rater‐ML for the automatic scoring of students' open‐ended responses and found good agreement between the automatic and human expert ratings.…”
Section: The Next Steps In Technology‐based Guidance Of the Inquiry Pmentioning
confidence: 99%
“…Sao Pedro, Baker, Gobert, Montalvo, and Nakama () described a set of “detectors” that could fairly reliably detect whether a student was performing controlled or uncontrolled experiments, whether the student's experiment was testing a hypothesis or not, and whether students were planning their behaviour (see also Gobert, Kim, Sao Pedro, Kennedy, & Betts, ; Gobert, Sao Pedro, Raziuddin, & Baker, ). Liu, Rios, Heilman, Gerard, and Linn () assessed the validity of a tool called c‐rater‐ML for the automatic scoring of students' open‐ended responses and found good agreement between the automatic and human expert ratings.…”
Section: The Next Steps In Technology‐based Guidance Of the Inquiry Pmentioning
confidence: 99%
“…Liu, Rios, Heilman, Gerard, and Linn () identified 11 studies where automated text scoring assessed students’ responses to open‐ended science items. Of the 11 studies, nine addressed college students’ writing and two studies addressed middle‐school students’ writing.…”
Section: Literature Reviewmentioning
confidence: 99%
“…In our early trials we could reach average correlations of 0.37 between the prediction of abilities in a criterion and later-earned scores in an exam in this topic [24]. In contrast, the best correlation between a human rater and an automatic scoring system is about r=0.52 [25]. Given the fact that the LMSA Kit uses educational data stored in the LMS to predict results in a future exam, the smaller correlations are not surprising.…”
Section: The Lmsa Kit Offers Insights Into Students' Learningmentioning
confidence: 99%