As a trending method for the authentication, biometrics tends to be integrated in various devices, and in particular in smartphones. If the evaluation is performed on operational device, the biometric sample and algorithm are not reachable by the assessors. So, these latter have to perform an evaluation on a system considered as a black box. This kind of evaluation implies numerous manual comparison. This paper proposes a methodology to perform an evaluation of biometric black boxes. In order to obtain this methodology, two experiments were performed in order to determine an optimized conduct. Nevertheless, these experiments were realized with small test population, in order to assess the methodology a full scale evaluation was performed. This paper describes the used methodology to perform evaluation on black boxes systems, and the results obtained on the systems under test.