Computer-aided diagnostic (CAD) schemes have been developed for assisting radiologists in the detection of various lesions in medical images. The reliable evaluation of CAD schemes is an important task in the field of CAD research. In the past, many evaluation approaches, such as the resubstitution, leave-one-out, cross-valiation, hold-out, and bootstrap methods have been proposed for evaluating the performance of various CAD schemes. However, some important issues in the evaluation of CAD schemes have not been systematically analyzed, either theoretically or experimentally. The first important issue is the analysis and comparison of various evaluation methods in terms of some characteristics, in particular, the bias and the generalization performance of the trained CAD schemes. The second includes the analysis of pitfalls in the incorrect use of various evaluation methods and the effective approaches to the reduction of the bias and variance caused by these pitfalls. We attempt to address the first important issue in details in this article, and to discuss the second issue in the Discussion section. We believe that this article would be useful to researchers in the field of CAD research for selecting appropriate evluation methods and for improving the reliability of the estimated performance of their CAD schemes.