Integration of Prediction Scores From Various Automated Essay Scoring Models Using Item Response Theory

Uto, Masaki; Aomi, Itsuki; Tsutsumi, Emiko; Ueno, Maomi

doi:10.1109/tlt.2023.3253215

Cited by 6 publications

(6 citation statements)

References 88 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In such cases, it is not easy to deduce which model should be used for the final AES. Consequently, instead of relying on a single AES model, incorporating predictions from multiple AES models in a suitable manner is expected to enhance scoring accuracy (Sagi & Rokach, 2018; Uto et al, 2023). An easy way to do this is simply by averaging the scores from the individual AES models or by adopting the majority vote.…”

Section: Automated Content Scoringmentioning

confidence: 99%

“…In their recent work, Uto et al (2023) used a generalized many-facet Rasch model to integrate prediction scores from various AES models that were trained based on the ASAP dataset (Hamner et al, 2012) to predict a holistic score of writing quality. Their results showed that the proposed method achieved higher accuracy than that of individual AES models and conventional score-integration methods.…”

Section: Automated Content Scoringmentioning

confidence: 99%

See 1 more Smart Citation

A Hierarchical Rater Model Approach for Integrating Automated Essay Scoring Models

Fink,

Gombert,

Liu

et al. 2024

Zeitschrift für Psychologie

View full text Add to dashboard Cite

Essay writing tests, integral in many educational settings, demand significant resources for manual scoring. Automated essay scoring (AES) can alleviate this by automating the process, thereby reducing human effort. However, the multitude of AES models, each varying in its features and scoring approaches, complicates selecting one optimal model, especially when evaluating diverse content-related aspects across multiple rating items. Therefore, we propose a hierarchical rater model-based approach to integrate predictions from multiple AES models, accounting for their distinct scoring behaviors. We investigated its performance on data from a university essay writing test. The proposed method achieved accuracy that was comparable to the best individual AES model. This is a promising result because it additionally reduced the amount of differential item functioning between human and automated scoring and thus established a higher degree of measurement invariance compared to the individual AES models.

show abstract

Section: Automated Content Scoringmentioning

confidence: 99%

Section: Automated Content Scoringmentioning

confidence: 99%

A Hierarchical Rater Model Approach for Integrating Automated Essay Scoring Models

Fink,

Gombert,

Liu

et al. 2024

Zeitschrift für Psychologie

View full text Add to dashboard Cite

show abstract

“…Liu et al [29] designed a Two-Stage Learning Framework (TSLF) to extract semantic features, fluency features, and relevance features through the neural network model, fusing artificial features for scoring. Uto et al [30] proposed a fusion method that utilizes item response theory to consider differences in scoring behavioral characteristics and integrate prediction scores from various AES models.…”

Section: Aes Based On Hybrid Modelmentioning

confidence: 99%

Automatic Essay Scoring Method Based on Multi-Scale Features

Cui

et al. 2023

Applied Sciences

View full text Add to dashboard Cite

Essays are a pivotal component of conventional exams; accurately, efficiently, and effectively grading them is a significant challenge for educators. Automated essay scoring (AES) is a complex task that utilizes computer technology to assist teachers in scoring. Traditional AES techniques only focus on shallow linguistic features based on the grading criteria, ignoring the influence of deep semantic features. The AES model based on deep neural networks (DNN) can eliminate the need for feature engineering and achieve better accuracy. In addition, the DNN-AES model combining different scales of essays has recently achieved excellent results. However, it has the following problems: (1) It mainly extracts sentence-scale features manually and cannot be fine-tuned for specific tasks. (2) It does not consider the shallow linguistic features that the DNN-AES cannot extract. (3) It does not contain the relevance between the essay and the corresponding prompt. To solve these problems, we propose an AES method based on multi-scale features. Specifically, we utilize Sentence-BERT (SBERT) to vectorize sentences and connect them to the DNN-AES model. Furthermore, the typical shallow linguistic features and prompt-related features are integrated into the distributed features of the essay. The experimental results show that the Quadratic Weighted Kappa of our proposed method on the Kaggle ASAP competition dataset reaches 79.3%, verifying the efficacy of the extended method in the AES task.

show abstract

“…The AES tools' correlations and agreement with human raters have become fairly high (Ifenthaler, 2022;Link & Koltovskala, 2023;Warschauer & Ware, 2006). State-of-the-art models report quadratic weighted Kappas ranging from .57 to .80, with most in the low .70's, evidencing substantial agreement between the models and human raters (Beseiso et al, 2021;Uto et al, 2023). Many of these studies highlight the results of adjacent agreement between humans and AES systems rather than those of exact agreement (Ifenthaler & Dikli, 2015).…”

Section: Introductionmentioning

confidence: 99%

Can AI provide useful holistic essay scoring?

Tate,

Steiss,

Bailey

et al. 2024

Computers and Education: Artificial Intelligence

View full text Add to dashboard Cite

Integration of Prediction Scores From Various Automated Essay Scoring Models Using Item Response Theory

Cited by 6 publications

References 88 publications

A Hierarchical Rater Model Approach for Integrating Automated Essay Scoring Models

A Hierarchical Rater Model Approach for Integrating Automated Essay Scoring Models

Automatic Essay Scoring Method Based on Multi-Scale Features

Can AI provide useful holistic essay scoring?

Contact Info

Product

Resources

About