Israel at AustinSubsequent to an overview of the basic concepts in generalizability theory, a computer program for studying generalizability of scores in 3 facet (4 factor) designs is described. An illustrative example is also provided.EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 1978, 38 IN recent years educational and psychological researchers have begun to focus on the dependability of behavioral measurement. Reviews (Borich, 1977;Shavelson and Atwood, 1977) and psychological investigations of teacher effectiveness (Erlich and Shavelson, 1976a; Erlich and Borich, 1976a, b;Evertson, Anderson, Edgar, Minter and Brophy, 1977) have shown that one explanation for past failures to find empirically consistent relationships between classroom interactions and student achievement may lie in the process employed for quantifying both process and product variables. The purpose of this paper was to delineate some measurement problems raised by classical test theory, to describe how generalizability theory may be a solution to these problems, and to present a computer program for studying reliability (generalizability) with a 4 factor design.
Limitation of the Classical Approach to ReliabilityIn most past classroom research, the reliability of process and product measures, namely, the consistency of rank ordering of the measures, was estimated by one or more of the following methods: (1) interrater reliability-the amount of agreement between two or more independent observers or scorers; (2) stability coefficient-the reliability of scores across time, content, or students; and (3) test-retest