“…The quality of observational data is usually judged from interobserver agreement scores because of the difficulty in procuring a criterion against which to measure the observers' actual accuracy. Possible accuracy criteria, however, include mechanical measurements of behavior (e.g., Bechtel, 1967), mechanically generated re-479 1981, 141, 479-489 NUMBER 4 (WINTER 198 1) sponses (e.g., Repp, Roberts, Slack, Repp, & Berkler, 1976), recorded behaviors orchestrated by a predetermined script (e.g., Mash & McElwee, 1974), and consensually validated criterion protocols produced by the observation of multiple observers (e.g., Kent et al, 1974;Foster & Cone, 1980). Although agreement is generally used to evaluate the quality of observational data, agreement and accuracy are not the same (Foster & Cone, 1980;Johnson & Bolstad, 1973;Kazdin, 1977).…”