The performance of many visual recognition algorithms -e.g. face verification, person re-identification -heavily depends on the metric used to measure the similarity between input data. Recent works such as have shown how interesting it is to learn an optimal metric for a particular task using Metric Learning (ML). Most approaches learn a Mahalanobis metric based on an objective function whose constraints comes either from a set of labelled examples or, more frequently, from sets of positive (same class) and negative (different class) pairs.In the recent works that have used metric learning (ML), the works of Shen et al. [4] and Bi et al. [2], introducing algorithms based on Boosting approach, deserves a particular attention due to the interesting properties they offers: i) they are efficient and scalable, as no semidefinite programming is required, at each iteration only the largest eigenvalue and corresponding eigenvectors are needed, ii) Like AdaBoost, they don't have any parameter to tune and is easy to implement as no sophisticated optimization techniques are involved. It hence contrasts with most of the commonly used ML methods for which hyper-parameters, often introduced for regularization purpose, have to be adjusted by cross-validation.However, one strong limitation of these approaches [2, 4] is that it requires having triplets for learning the metric i.e. constraints expressed under the form D(x a , x b ) < D(x a , x c ) where x a , x b , x c are three input vector for which the label is known: positive and negative pairs within a constraint do have to share a common vector. This is a limitation as for most of the verification task only training pairs are available (e.g. for person re-identification on the Viper dataset only pairs of same/different persons are provided for training, and thus using such triplets is not possible).One of the key contribution of this paper is to propose a metric learning approach based on boosting allowing to use pairs of points for training i.e. which can use constraints such as D(x a , x b ) < D(x c , x d ) while keeping the good properties of [4] (scalability, simplicity, no parameters, etc.). Our approach is based on the ranking algorithm RankBoost [3], known to offer 3 particularly interesting features: i) no hyperparameter, ii) great robustness to overfitting (explained with a standard VC-dimensional analysis techniques), and iii) a computational trick for reducing the complexity of the learning step. In the following, we show how to build on RankBoost [3] to efficiently address this metric learning problem.Moreover, we propose a method for building hierarchical face representations for efficient identity face retrieval task. The main idea is to use a ML algorithm for building a tree structured representation of a large set of faces, such that identical-identity faces belongs to the same cluster at each tree level. The hierarchical clustering is built recursively: at each level, a new metric is learned using the faces belonging to a particular tree node which is then...