1995
DOI: 10.1177/014662169501900107
|View full text |Cite
|
Sign up to set email alerts
|

Complex Composites: Issues That Arise in Combining Different Modes of Assessment

Abstract: Data from the California Learning Assessment System are used to examine certain characteristics of tests designed as the composites of items of different modes. The characteristics include rater severity, test information, and definition of the latent variable. Three different assessment modes-multiple-choice, open-ended, and investigation items (the latter two are referred to as performance-based modes)-were combined in a test across three different test forms. Rater severity was investigated by incorporating… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
15
0

Year Published

1999
1999
2013
2013

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 27 publications
(18 citation statements)
references
References 17 publications
3
15
0
Order By: Relevance
“…In line with earlier research on the statewide tests administered by the CDE (Wilson & Case, 2000;Wilson & Wang, 1995) and research by others from as early as the 1930s (e.g., Ashbum, 1938;Eells, 1930) significant variability between raters was found in their essay scoring. Moreover, significant variation was also found within raters overqtime, confirming findings reported by Braun (1988).…”
Section: Discussionsupporting
confidence: 86%
See 1 more Smart Citation
“…In line with earlier research on the statewide tests administered by the CDE (Wilson & Case, 2000;Wilson & Wang, 1995) and research by others from as early as the 1930s (e.g., Ashbum, 1938;Eells, 1930) significant variability between raters was found in their essay scoring. Moreover, significant variation was also found within raters overqtime, confirming findings reported by Braun (1988).…”
Section: Discussionsupporting
confidence: 86%
“…An advantage of IRT also is that it can easily handle data that are obtained with a scoring scale with a limited range. Several authors have used IRT models to document the impact of rater effects on student scores (e.g., Engelhard, 1994Engelhard, , 1996Lunz, Wright, & Linacre, 1990;Myford & Mislevy, 1995;Patz, 1996;Wilson & Wang, 1995;Wolfe & Myford, 1997).…”
mentioning
confidence: 99%
“…This uncertainty, if not closely monitored, can have devastating consequences. Wilson and Wang (1995) have described the observed impact of varying rater severities on the scores of students in CLAS mathematics performance tasks: they found that the expected raw score differences resulting from scorer severity differences could be as large as 2.0 score points (on a 6-point scale), and that these raw score effects would result in a difference of 10 percentile points for typical students on the particular test under study. Koretz, Stecher, Klein, and McCaffrey (1994) show that Vermont's portfolio assessment program was unable to report school-level scores due primarily to unacceptably high variability between raters.…”
Section: Application: Model-based Approaches To Rater Effectsmentioning
confidence: 99%
“…Software to apply re- stricted cases of the LLTM (so-called facets models) has been developed by Linacre (1989), as has software that can estimate models specified under the full LLTM approach (Adams & Wilson, 1996;Adams, Wilson, & Wu, 1997;Ponocny & Ponocny-Seliger, 1997). The technique has been applied to rater effect estimation by Engelhard (1994Engelhard ( , 1996, Myford and Mislevy (1995), and Wilson and Wang (1995). The LLTM rater model for a dichotomous item j taken by examinee /' and rated by rater r, has IRF PUT = W* = 1M,P,) = 1+ ex P -' ( e,-P;-p r r (1?)…”
Section: Irt Models For Rater Effectsmentioning
confidence: 99%
“…Although recognizing advantages to performance-based examinations (i.e., simulations and performance observations) versus multiple choice examinations, Wilson & Wang (1995, pp.52-53) discuss concerns for rater effects in performance-based examinations, including inter-rater variation in rater severity and within-rater variation of rater severity. Are some raters harder or easier than others?…”
Section: Introductionmentioning
confidence: 99%