Randomized benchmarking (RB) is a popular procedure used to gauge the performance of a set of gates useful for quantum information processing (QIP). Recently, Proctor et al. [Phys. Rev. Lett. 119, 130502 (2017)] demonstrated a practically relevant example where the RB measurements give a number r very different from the actual average gate-set infidelity , despite past theoretical assurances that the two should be equal. Here, we derive formulas for , and for r from the RB protocol, in a manner permitting easy comparison of the two. We show in general that, indeed, r = , i.e., RB does not measure average infidelity, and, in fact, neither one bounds the other. We give several examples, all plausible in real experiments, to illustrate the differences in and r. Many recent papers on experimental implementations of QIP have claimed the ability to perform high-fidelity gates because they demonstrated small r values using RB. Our analysis shows that such a statement from RB alone has to be interpreted with caution.