A multidimensional generalized many-facet Rasch model for rubric-based performance assessment

Uto, Masaki

doi:10.1007/s41237-021-00144-w

Cited by 7 publications

(7 citation statements)

References 57 publications

(53 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The MFRM is the most common type of model used for IRT with rater parameters (Linacre, 1989 ). Furthermore, there are various alternative models such as a two-parameter logistic model with rater severity parameters (Patz & Junker, 1999 ), generalized partial credit models incorporating various rater parameters (Uto, 2021b ; Uto & Ueno, 2020 ), hierarchical rater models (DeCarlo, Kim, & Johnson, 2011 ; Patz, Junker, Johnson, & Mariano, 2002 ; Qiu, Chiu, Wang, & Chen, 2022 ), extensions based on signal detection models (DeCarlo, 2005 ; Soo Park & Xing, 2019 ), rater bundle models (Wilson & Hoskens, 2001 ), and trifactor models (Shin et al, 2019 ). However, this study focuses on the MFRM because it is the most widely used and well-established of these models.…”

Section: Many-facet Rasch Models For Rater Severity Driftmentioning

confidence: 99%

“…For complex models, however, it is not generally feasible to derive or calculate the marginal posterior distribution due to there being high-dimensional multiple integrals. MCMC, a random sampling-based estimation method, has been widely used in various fields to address this problem, including in IRT studies (Brooks, Gelman, Jones, & Meng, 2011 ; Fontanella et al, 2019 ; Fox, 2010 ; Uto, 2021b ; Uto & Ueno, 2020 ; van Lier et al, 2018 ; Zhang, Xie, You, & Huang, 2011 ).…”

Section: Proposed Modelmentioning

confidence: 99%

“…It was recently developed along with a software package called “Stan” (Carpenter et al, 2017 ), which makes implementation of a NUT-based HMC easy. Thus, NUT has recently been widely used to perform parameter estimations for various statistical models, including IRT models (Jiang & Carter, 2019 ; Luo & Jiao, 2018 ; Uto, 2021b ; Uto & Ueno, 2020 ).…”

Section: Proposed Modelmentioning

confidence: 99%

See 2 more Smart Citations

A Bayesian many-facet Rasch model with Markov modeling for rater severity drift

Uto

2022

Behav Res

Self Cite

View full text Add to dashboard Cite

Fair performance assessment requires consideration of the effects of rater severity on scoring. The many-facet Rasch model (MFRM), an item response theory model that incorporates rater severity parameters, has been widely used for this purpose. Although a typical MFRM assumes that rater severity does not change during the rating process, in actuality rater severity is known to change over time, a phenomenon called rater severity drift. To investigate this drift, several extensions of the MFRM have been proposed that incorporate time-specific rater severity parameters. However, these previous models estimate the severity parameters under the assumption of temporal independence. This introduces inefficiency into the parameter estimation because severities between adjacent time points tend to have temporal dependency in practice. To resolve this problem, we propose a Bayesian extension of the MFRM that incorporates time dependency for the rater severity parameters, based on a Markov modeling approach. The proposed model can improve the estimation accuracy of the time-specific rater severity parameters, resulting in improved estimation accuracy for the other rater parameters and for model fitting. We demonstrate the effectiveness of the proposed model through simulation experiments and application to actual data.

show abstract

Section: Many-facet Rasch Models For Rater Severity Driftmentioning

confidence: 99%

Section: Proposed Modelmentioning

confidence: 99%

Section: Proposed Modelmentioning

confidence: 99%

See 1 more Smart Citation

A Bayesian many-facet Rasch model with Markov modeling for rater severity drift

Uto

2022

Behav Res

Self Cite

View full text Add to dashboard Cite

show abstract

“…The unbiased essay scores θ j in the model can be estimated from observed essay rating data U while considering rater bias effects in a manner similar to that of the traditional GPCM, which can estimate examinee abilities while considering the effects of item characteristics. IRT models with rater parameters, including GMFRM, have been widely used for various performance tests, including essay writing tests and speaking tests, not only to realize an accurate ability or score estimation but also to analyze effects of various bias factors such as rater bias (e.g., [8]- [13], [35], [41]- [43], [58]).…”

Section: B Irt Models With Rater Parametersmentioning

confidence: 99%

Learning Automated Essay Scoring Models Using Item-Response-Theory-Based Scores to Decrease Effects of Rater Biases

Uto

Okano

2021

IEEE Trans. Learning Technol.

Self Cite

View full text Add to dashboard Cite

In automated essay scoring (AES), scores are automatically assigned to essays as an alternative to grading by humans. Traditional AES typically relies on handcrafted features, whereas recent studies have proposed AES models based on deep neural networks to obviate the need for feature engineering. Those AES models generally require training on a large dataset of graded essays. However, assigned grades in such a training dataset are known to be biased owing to effects of rater characteristics when grading is conducted by assigning a few raters in a rater set to each essay. Performance of AES models drops when such biased data are used for model training. Researchers in the fields of educational and psychological measurement have recently proposed item response theory (IRT) models that can estimate essay scores while considering effects of rater biases. This study therefore proposes a new method that trains AES models using IRT-based scores for dealing with rater bias within training data.

show abstract

“…In short, one cannot simply average the scores and call the output knowledge or learning. While some attempts have been made in the psychometric literature to measure knowledge with rubrics (e.g., Uto, 2021), to our knowledge, none have explicitly modeled the censoring problem. 1 In this paper, we show how to model and estimate this problem in a robust, understandable, way.…”

Section: Introductionmentioning

confidence: 99%

Assessing proxies of knowledge and difficulty with rubric‐based instruments

Smith,

Wooten

2023

Southern Economic Journal

View full text Add to dashboard Cite

The fields of psychometrics, economic education, and education have developed statistically‐valid methods of assessing knowledge and learning. These methods include item response theory, value‐added learning models, and disaggregated learning. These methods, however, focus on multiple‐choice or single response assessments. Faculty and administrators routinely assess knowledge through papers, thesis presentations, or other demonstrations of knowledge assessed with rubric rows. This paper presents a statistical approach to estimating a proxy for student ability and rubric row difficulty. Moreover, we have developed software so that practitioners can more easily apply this method to their instruments. This approach can be used in researching education treatment effects, practitioners measuring learning outcomes in their own classrooms, or estimating knowledge for administrative assessment. As an example, we have applied these new methods to projects in a large Labor Economics course at a public university.

show abstract

A multidimensional generalized many-facet Rasch model for rubric-based performance assessment

Cited by 7 publications

References 57 publications

A Bayesian many-facet Rasch model with Markov modeling for rater severity drift

A Bayesian many-facet Rasch model with Markov modeling for rater severity drift

Learning Automated Essay Scoring Models Using Item-Response-Theory-Based Scores to Decrease Effects of Rater Biases

Assessing proxies of knowledge and difficulty with rubric‐based instruments

Contact Info

Product

Resources

About