Intraclass correlation reliablity estimates are based on the assumption that the various measures are equivalent. Jöreskog's (1970) general model for the analysis of covariance structures can be used to test the validity of this assumption.
Use of tests and assessments as key elements in five waves of educational reform during the past 50 years are reviewed. These waves include the role of tests in tracking and selection emphasized in the 1950s, the use of tests for program accountability in the 1960s, minimum competency testing programs of the 1970s, school and district accountability of the 1980s, and the standards-based accountability systems of the 1990s. Questions regarding the impact, validity, and generalizability of reported gains, and the credibility of results in high-stakes accountability uses are discussed. Emphasis is given to three issues regarding currently popular accountability systems. These are (a) the role of content standards, (b) the dual goals of high performance standards and common standards for all students, and (c) the validity of accountability models. Some suggestions for dealing with the most severe limitations of accountability are provided.
Guidelines are proposed for evaluating a computerized adaptive test. Topics include dimensionality, measurement error, validity, estimation of item parameters, item pool characteristics and human factors. Equating CAT and conventional tests is considered and matters of equity are addressed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.