One of the concerns of the compiler writer is the quality of object programs produced by the compiler, and in particular their performance at execution time. A survey of methods for measuring this performance, and experiments with the use of those methods, is presented. We examine two general categories of evaluation: comparative evaluation, in which benchmark programs are run on groups of language systems; and analytic evaluation, in which a single system is measured in terms determined by its own structure. Besides surveying the results of various evaluation experiments, we present in detail the results of a series of experiments on a particular language system (PDP11 Al.GTO