The inferences drawn from many statistical analyses are not congruent with the analyses performed. The analysis commonly employed when cognitive or affective measures serve as the dependent variable yields an inference that is rigorously generalizable only to the specific set of scales or items that were employed, and not to the intended universe of items of which the set used is viewed as a representative sample. The arguments in this paper show that if the items are incorporated into the design as levels of a random facet via generalizability theory, the inferential question in the desired universe of inference can be examined statistically. An estimate of the reliability (generalizability) of the outcome variable also accrues from the proposed analysis; consequently the statistical and measurement fidelity questions are unified in a single analysis. Advantages of nesting items within subjects are proposed.Cronbach's generalizability theory (Cronbach, Glaser, Nanda, & Rajaratnam, 1972) has liberated measurement theory from its classical stance, which ignored latent random effects in the relevant universe of inference. Generalizability theory shows that classical test theory ordinarily underestimates the degree of measurement error in the appropriate universe of generalization; that is, the inferences are not statistically congruent with those addressed by the reliability (generalizability) coefficient because undefined random sources of variation (facets) in the system are not acknowledged.In much research in the behavioral and social sciences, there is a related incongruity pertaining to the statistical analysis and the associated inference that is rigorously justified. Unlike the classical applications of statistics in agriculture, the desired psychological and educational outcomes in behav-703 at The University of Iowa Libraries on June 21, 2015 http://aerj.aera.net Downloaded from KENNETH D. HOPKINS ioral research usually cannot be measured directly or exhaustively. The numbers associated with bushels, pounds, pigs/litter, and so forth differ fundamentally from cognitive and affective measures in ways that have important implications for statistical analysis and interpretation. Items on tests and inventories are only a sample of the universe of items to which an inference is intended, whereas there is no sampling in the agricultural measures per se.In the late 1940s, statisticians derived the distribution theory associated with "fixed," "random," and "mixed" analysis of variance (ANOVA) models-models that allow a statistical analysis to approach more nearly the appropriate universe of inference for the independent variables that were employed. Unfortunately, statistical applications in educational research have not extended the logic to include the sampling error associated with the dependent variable when it is scores on a test or inventory.
Generalizability theory illuminates the subtle inconsistency between the statistical analysis and the related universe of inference, a problem that has been unrecogniz...