Evidence-centered design (ECD) is a framework for the design and development of assessments that ensures consideration and collection of validity evidence from the onset of the test design. Blending learning and assessment requires integrating aspects of learning at the same level of rigor as aspects of testing. In this paper, we describe an expansion to the ECD framework (termed e-ECD) such that it includes the specifications of the relevant aspects of learning at each of the three core models in the ECD, as well as making room for specifying the relationship between learning and assessment within the system. The framework proposed here does not assume a specific learning theory or particular learning goals, rather it allows for their inclusion within an assessment framework, such that they can be articulated by researchers or assessment developers that wish to focus on learning.
Two consistent findings from the study of the fit between judgment of performance and actual performance are general overconfidence and the hard-easy effect, with overconfidence being higher with more difficult stimuli. These findings are based on aggregated analyses of confidence and accuracy, despite the fact that confidence judgments are individual and are provided at the item level. Furthermore, an important characteristic of item performance judgments that is ignored by traditional analyses is that the objective difficulty of any item can be estimated before it is administered to a person. We argue that traditional analyses confound possible bias in subjective estimates of the difficulty of items (i.e., confidence judgments) with variations in objective difficulty of items. We propose a multilevel approach to the analysis of confidence judgments, whereby the probability of a correct response is modeled as a function of both objective difficulty and subjectively judged difficulty. In this model, the intercept represents the possible overall bias (over-or underconfidence) in subjective difficulty judgments, after controlling for objective difficulty as well as variations across persons and items. In effect we are proposing a new, more nuanced, standard for defining calibration and identifying distinct patterns of miscalibration. We demonstrate the confounding effects of conventional aggregated analysis through synthetic examples and apply the proposed approach to the analysis of empirical data. Conventional analyses replicated the overall overconfidence and the hard-easy effect, but the item response modeling results failed to identify an overall bias in confidence judgments or a test difficulty effect.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.