2019
DOI: 10.1111/jedm.12201
|View full text |Cite
|
Sign up to set email alerts
|

The Effects of Incomplete Rating Designs in Combination With Rater Effects

Abstract: Researchers have explored a variety of topics related to identifying and distinguishing among specific types of rater effects, as well as the implications of different types of incomplete data collection designs for rater-mediated assessments. In this study, we used simulated data to examine the sensitivity of latent trait model indicators of three rater effects (leniency, central tendency, and severity) in combination with different types of incomplete rating designs (systematic links, anchor performances, an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
25
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 27 publications
(26 citation statements)
references
References 28 publications
1
25
0
Order By: Relevance
“…To facilitate estimation of the examinee and rater parameters, I simulated a small “anchor set” of three examinees whose performances were rated by all of the raters (see Engelhard & Wind, 2018). This type of assessment design, in which raters score a common anchor set of examinees and then a single rater scores each examinee, is common in contexts such as teacher evaluation, where individual raters (e.g., principals) score teachers on several aspects of their teaching effectiveness, and each rater does not score any teachers in common with any other raters except for those included in the anchor set (e.g., Wind & Jones, 2019). This design is also relatively common in music performance assessments where a rater scores a single examinee performance on a variety of domains (e.g., Wesolowski et al., 2015).…”
Section: Methodsmentioning
confidence: 99%
“…To facilitate estimation of the examinee and rater parameters, I simulated a small “anchor set” of three examinees whose performances were rated by all of the raters (see Engelhard & Wind, 2018). This type of assessment design, in which raters score a common anchor set of examinees and then a single rater scores each examinee, is common in contexts such as teacher evaluation, where individual raters (e.g., principals) score teachers on several aspects of their teaching effectiveness, and each rater does not score any teachers in common with any other raters except for those included in the anchor set (e.g., Wind & Jones, 2019). This design is also relatively common in music performance assessments where a rater scores a single examinee performance on a variety of domains (e.g., Wesolowski et al., 2015).…”
Section: Methodsmentioning
confidence: 99%
“…Thus, we conducted the same experiment as described above assuming a practice situation where few raters are assigned to each examinee. Concretely, in Procedure 2, we first assigned two raters to each examinee based on a systematic link design (Shin et al 2019;Uto 2020;Wind and Jones 2019), and then we generated the data based on the rater assignment. The examples of a fully crossed design and a systematic link design are illustrated in Tables 6 and 7, where checkmarks indicate an assigned rater, and blank cells indicate that no rater was assigned.…”
Section: Accuracy Of Ability Measurementmentioning
confidence: 99%
“…The written essays were evaluated by 18 raters using a rubric consisting of 9 evaluation items divided into 4 rating categories. We assigned four raters to each essay based on a systematic links design (Shin et al 2019;Uto 2020;Wind and Jones 2019) to reduce the raters' assessment workload. The evaluation items column in Table 9 lists the abstracts of the evaluation items in the rubric, and was created based on two writing assessment rubrics proposed by Matsushita et al (2013), Nakajima (2017 for Japanese university students.…”
Section: Actual Datamentioning
confidence: 99%
“…Connected designs such as those illustrated in Figure 1 are effective as a means for estimating student achievement in sparse rater-mediated assessment networks (Wind & Jones, 2017, 2019a. However, they present some challenges for effectively detecting rater effects, such as rater bias (i.e., differential rater functioning [DRF]).…”
mentioning
confidence: 99%