The Gaussian Probabilistic Linear Discriminant Analysis (PLDA) model assumes Gaussian distributed priors for the latent variables that represent the speaker and channel factors. Assuming that each training i-vector belongs to a different speaker, as is usually done in i-vector extraction, i-vectors generated by a PLDA model can be considered independent and identically distributed with Gaussian distribution. Thus, we have recently proposed to transform the development i-vectors so that their distribution becomes more Gaussian-like. This is obtained by means of a sequence of affine and non-linear transformations whose parameters are trained by Maximum Likelihood (ML) estimation on the development set. The evaluation i-vectors are then subject to the same transformation. Although the i-vector "gaussianization" has shown to be effective, since the i-vectors extracted from segments of the same speaker are not independent, the original assumption is not satisfactory. In this work we show that the model can be improved by properly exploiting the information about the speaker labels, which was ignored in the previous model. In particular, a more effective PLDA model can be obtained by jointly estimating the PLDA parameters and the parameters of the non-linear transformation of the i-vectors. In other words, while the goal of the previous approach was to "gaussianize" the training i-vectors distribution, the objective of this work is to embed the estimation of the non-linear i-vector transformation in the PLDA model estimation. We will thus refer to this model as the non-linear PLDA model (NL-PLDA). We show that this new approach provides significant gain with respect to PLDA, and a small, yet consistent, improvement with respect to our former i-vector "gaussianization" approach, without further additional costs.