This paper reviews four projects aimed at evaluating the technical quality of recent summative evaluations. It also helps identify some technical problems that frequently occur in evaluation research , and it outlines practical methods for solving these problems within the limits imposed by the current state of the art. The methods involve evaluating evaluations during or after their completion , with a stress on the former. The limitations of each method are outlined, together with ways of overcoming them. / his paper deals with what Orata (1940) called the &dquo;evaluation of l evaluations&dquo; and what Scriven (1969) termed &dquo;metaevaluation.&dquo; Our purpose is to describe and comment upon some recent systematic attempts to evaluate evaluations and to describe some ways in which metaevaluation can currently be used to improve the technical quality of evaluation research. (1975: 21) wanted &dquo;to identify, using generally accepted criteria of social science research quality, the conditions under which high-and low-quality evaluation investigations occur.&dquo; Hence, they retrieved information about 1,000 federally funded evaluations for fiscal year 1970 that were ongoing, had budgets in excess of $10,000, and were targeted at problems in health, education, welfare, manpower, income security, housing, or public safety. Two judges read abstracts of the studies or the original research proposals, and from these it was determined that 416 of the studies met preordained criteria of &dquo;evaluation research.&dquo; Questionnaires were then mailed to the project directors of each evaluation, and 318 were returned. Of these, 82 were omitted from further consideration because the project directors did not characterize their studies as evaluations. This meant that Bernstein and Freeman's basic data came from 236 evaluations, each of which was rated by personnel doing the evaluation while the study was still under way.Bernstein and Freeman used the questionnaire returns to determine how well each evaluation met their criteria for a &dquo;comprehensive&dquo; evaluation. The latter was defined as a study in which appropriate techniques were used to assess whether an intervention had been implemented as planned (the process component), and had produced its intended outcomes (the impact component). Process was defined and rated in terms of sampling, the nature of the data, and statistical procedures. Impact was defined and rated with reference to design, sampling, and measurement. Indexes were constructed for each of these six dimensions, and each was given equal weight in computing a global index of overall quality. Bernstein and Freeman concluded that about two-thirds of the evaluations were &dquo;comprehensive&dquo; in that they assessed both &dquo;process&dquo; and &dquo;impact,&dquo; but only about 13% of the total sample of 236 evaluations were &dquo;adequate&dquo; in that they had &dquo;high&dquo; ratings on each of the six scales.Several points must be borne in mind about this last conclusion. First, it migh...