Accuracy of performance-test linking based on a many-facet Rasch model

Uto, Masaki

doi:10.3758/s13428-020-01498-x

Cited by 16 publications

(8 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…1) To appropriately estimate the item parameters based on the original GMFRM while ensuring parameter linking, we require a scored essay dataset in which some examinees answered all the items [83], [84]. However, almost none of the existing datasets that are used for AES studies include such examinees.…”

Section: A Generalized Many-facet Rasch Modelmentioning

confidence: 99%

Integration of Prediction Scores From Various Automated Essay Scoring Models Using Item Response Theory

Uto

Aomi

Tsutsumi

et al. 2023

IEEE Trans. Learning Technol.

View full text Add to dashboard Cite

In automated essay scoring (AES), essays are automatically graded without human raters. Many AES models based on various manually designed features or various architectures of deep neural networks have been proposed over the past few decades. Each AES model has unique advantages and characteristics. Therefore, rather than using a single AES model, appropriate integration of predictions from various AES models is expected to achieve higher scoring accuracy. In the present paper, we propose a method that uses item response theory to integrate prediction scores from various AES models while taking into account differences in the characteristics of scoring behavior among models. It is found that the proposed method achieves higher accuracy than that of individual AES models and conventional score-integration methods. Furthermore, the proposed method facilitates interpreting each AES model's scoring characteristics and score-integration mechanism.

show abstract

Section: A Generalized Many-facet Rasch Modelmentioning

confidence: 99%

Integration of Prediction Scores From Various Automated Essay Scoring Models Using Item Response Theory

Uto

Aomi

Tsutsumi

et al. 2023

IEEE Trans. Learning Technol.

View full text Add to dashboard Cite

show abstract

“…TAM can estimate student and item measures. The probability that a student will respond to an item correctly was determined by the difference between the student's achievement level and the item's difficulty [15].…”

Section: Measurement Modelmentioning

confidence: 99%

The development of molecular genetics concept test for senior high school students using Rasch analysis

Sari

Pongsophon²,

Vongsangnak³

et al. 2022

IJERE

View full text Add to dashboard Cite

<span lang="EN-US">Developing a high-quality test item requires substantial time and effort. A well-developed item bank is conducted using rigorous development and validation procedures. This study aimed to describe the development process of molecular genetics concept test (MGCT) for senior high school students using Rasch analysis under <a name="_Hlk107307570"></a>Berkeley evaluation and assessment research (BEAR) assessment system framework. The test consists of 50 multiple-choice items to assess conceptual understanding of molecular genetics concepts. The MGCT was developed based on curriculum analysis from the Indonesian ministry of education and culture and content-validated by three content experts comprising an expert in biology, an expert in bioinformatics, and an experienced Indonesian biology teacher in a senior high school. The MGCT was then piloted to 114 students who had taught the molecular genetics unit from a senior high school to conduct the empirical validation. The results from Rasch analysis showed that the MGCT is acceptable because all items have outfit and infit mean-square values in the acceptable range of 0.7 to 1.3 and the reliability is 0.43. So, the MGCT can be used to assess the understanding of the molecular genetics concept. However, several items were too difficult to discriminate the student ability. So, future studies need to develop more this MGCT to get a more appropriate instrument.</span>

show abstract

“…Thus, we conducted the same experiment as described above assuming a practice situation where few raters are assigned to each examinee. Concretely, in Procedure 2, we first assigned two raters to each examinee based on a systematic link design (Shin et al 2019;Uto 2020;Wind and Jones 2019), and then we generated the data based on the rater assignment. The examples of a fully crossed design and a systematic link design are illustrated in Tables 6 and 7, where checkmarks indicate an assigned rater, and blank cells indicate that no rater was assigned.…”

Section: Accuracy Of Ability Measurementmentioning

confidence: 99%

“…The written essays were evaluated by 18 raters using a rubric consisting of 9 evaluation items divided into 4 rating categories. We assigned four raters to each essay based on a systematic links design (Shin et al 2019;Uto 2020;Wind and Jones 2019) to reduce the raters' assessment workload. The evaluation items column in Table 9 lists the abstracts of the evaluation items in the rubric, and was created based on two writing assessment rubrics proposed by Matsushita et al (2013), Nakajima (2017 for Japanese university students.…”

Section: Actual Datamentioning

confidence: 99%

A multidimensional generalized many-facet Rasch model for rubric-based performance assessment

Uto

2021

Behaviormetrika

Self Cite

View full text Add to dashboard Cite

Performance assessment, in which human raters assess examinee performance in a practical task, often involves the use of a scoring rubric consisting of multiple evaluation items to increase the objectivity of evaluation. However, even when using a rubric, assigned scores are known to depend on characteristics of the rubric’s evaluation items and the raters, thus decreasing ability measurement accuracy. To resolve this problem, item response theory (IRT) models that can estimate examinee ability while considering the effects of these characteristics have been proposed. These IRT models assume unidimensionality, meaning that a rubric measures one latent ability. In practice, however, this assumption might not be satisfied because a rubric’s evaluation items are often designed to measure multiple sub-abilities that constitute a targeted ability. To address this issue, this study proposes a multidimensional IRT model for rubric-based performance assessment. Specifically, the proposed model is formulated as a multidimensional extension of a generalized many-facet Rasch model. Moreover, a No-U-Turn variant of the Hamiltonian Markov chain Monte Carlo algorithm is adopted as a parameter estimation method for the proposed model. The proposed model is useful not only for improving the ability measurement accuracy, but also for detailed analysis of rubric quality and rubric construct validity. The study demonstrates the effectiveness of the proposed model through simulation experiments and application to real data.

show abstract

Accuracy of performance-test linking based on a many-facet Rasch model

Cited by 16 publications

References 46 publications

Integration of Prediction Scores From Various Automated Essay Scoring Models Using Item Response Theory

Integration of Prediction Scores From Various Automated Essay Scoring Models Using Item Response Theory

The development of molecular genetics concept test for senior high school students using Rasch analysis

A multidimensional generalized many-facet Rasch model for rubric-based performance assessment

Contact Info

Product

Resources

About