Evidence-centered design (ECD) is a framework for the design and development of assessments that ensures consideration and collection of validity evidence from the onset of the test design. Blending learning and assessment requires integrating aspects of learning at the same level of rigor as aspects of testing. In this paper, we describe an expansion to the ECD framework (termed e-ECD) such that it includes the specifications of the relevant aspects of learning at each of the three core models in the ECD, as well as making room for specifying the relationship between learning and assessment within the system. The framework proposed here does not assume a specific learning theory or particular learning goals, rather it allows for their inclusion within an assessment framework, such that they can be articulated by researchers or assessment developers that wish to focus on learning.
Two consistent findings from the study of the fit between judgment of performance and actual performance are general overconfidence and the hard-easy effect, with overconfidence being higher with more difficult stimuli. These findings are based on aggregated analyses of confidence and accuracy, despite the fact that confidence judgments are individual and are provided at the item level. Furthermore, an important characteristic of item performance judgments that is ignored by traditional analyses is that the objective difficulty of any item can be estimated before it is administered to a person. We argue that traditional analyses confound possible bias in subjective estimates of the difficulty of items (i.e., confidence judgments) with variations in objective difficulty of items. We propose a multilevel approach to the analysis of confidence judgments, whereby the probability of a correct response is modeled as a function of both objective difficulty and subjectively judged difficulty. In this model, the intercept represents the possible overall bias (over-or underconfidence) in subjective difficulty judgments, after controlling for objective difficulty as well as variations across persons and items. In effect we are proposing a new, more nuanced, standard for defining calibration and identifying distinct patterns of miscalibration. We demonstrate the confounding effects of conventional aggregated analysis through synthetic examples and apply the proposed approach to the analysis of empirical data. Conventional analyses replicated the overall overconfidence and the hard-easy effect, but the item response modeling results failed to identify an overall bias in confidence judgments or a test difficulty effect.
Learning progressions (LPs) have seen a growing interest in recent years due to their potential benefits in the development of formative assessments for classroom use. Using an LP as the backbone of an assessment can yield diagnostic classifications of students that can guide instruction and remediation. In operationalizing an LP, assessment items are classified as measuring specific LP levels and, through the application of a measurement model, students are classified as masters of specific LP levels. To support the use of LPs in instructional planning and formative assessment, the reliability and validity of both item and student classifications should stand up to scrutiny. Reliability of classifications refers to their consistency. Validity of classifications refers to alignment of these classifications with test data. A framework for testing these classifications is proposed and implemented in a validation study of a rational number LP for elementary school mathematics. As part of this study, 400 items were classified in terms of LP level of understanding, a cognitive diagnostic model of student mastery level within the LP was fitted to the data, and analyses were conducted to assess the reliability and validity of these classifications.
With the rise of more interactive assessments, such as simulation- and game-based assessment, process data are available to learn about students' cognitive processes as well as motivational aspects. Since process data can be complicated due to interdependencies in time, our traditional psychometric models may not necessarily fit, and we need to look for additional ways to analyze such data. In this study, we draw process data from a study on self-adapted test under different goal conditions (Arieli-Attali, 2016) and use hidden Markov models to learn about test takers' choice making behavior. Self-adapted test is designed to allow test takers to choose the level of difficulty of the items they receive. The data includes test results from two conditions of goal orientation (performance goal and learning goal), as well as confidence ratings on each question. We show that using HMM we can learn about transition probabilities from one state to another as dependent on the goal orientation, the accumulated score and accumulated confidence, and the interactions therein. The implications of such insights are discussed.
Prior work on the CBAL™ mathematics competency model resulted in an initial competency model for middle school grades with several learning progressions (LPs) that elaborate central ideas in the competency model and provide a basis for connecting summative and formative assessment. In the current project, we created a competency model for Grades 3–5 that is based on both the middle school competency model and the Common Core State Standards (CCSS). We also developed an LP for rational numbers based on an extensive literature review, consultations with members of the CBAL mathematics team and other related research staff at Educational Testing Service, input from an advisory panel of external experts in mathematics education and cognitive psychology, and the use of small‐scale cognitive interviews with students and teachers. Elementary mathematical understanding, specifically that of rational numbers, is viewed as fundamental and critical to developing future knowledge and skill in middle and high school mathematics and therefore essential for success in the 21st century world. The competency model and the rational number LP serve as the conceptual basis for developing and connecting summative and formative assessment as well as professional support materials for Grades 3–5. We report here on the development process of these models and future implications for task development.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.