This thesis proposes and evaluates new machine learning techniques and models for different tasks in the field of speech processing. It mainly addresses the identification of speakers, languages, and accents using several descriptor proposals based on different sound representations. In addition, it presents a new transfer learning technique based on a new descriptor, and two new architectures for deep learning models based on complementary audio representations.The new transfer learning technique is based on a descriptor we call Grad-Transfer, which is based on the model interpretability method Gradient-weighted Class Activation Mapping (Grad-CAM). Grad-CAM generates a heat map of the most relevant zones in the input data according to their influence on a given model prediction. For the development of Grad-Transfer, we experimentally demonstrate, using Birch and k-means clustering algorithms, that the heat maps generated by the Grad-CAM method are able to store part of the knowledge acquired by a deep learning speech processing model fed by spectrograms during its training process. We exploited this capability of Grad-CAM to formulate a new technique that transfers knowledge from a pre-trained model to an untrained one, through the Grad-Transfer descriptor, which is responsible for summarizing and reusing such knowledge. Several Grad-Transfer-based models were evaluated for the accent identification task using the Voice Cloning Toolkit dataset. These models include Gaussian Naive Bayes, Support Vector Machines, and Passive Aggressive classifiers. Experimental results show an increase in performance of up to 23.58% in models fed by Grad-Transfer descriptors and spectrograms compared to models fed by spectrograms alone. This demonstrates the ability of Grad-Transfer to improve the performance of speech processing models and opens the door to new implementations for similar tasks.On the other hand, new transfer learning approaches based on embedding generation models were evaluated. Embeddings are generated by machine learning models trained for a specific task on large datasets. By exploiting the knowledge already acquired, these models can be reused for new tasks where the amount of available data is small. This thesis proposes a new architecture for deep learning models, called Mel and Wave Embeddings for Human Voice Tasks (MeWEHV), capable of generating robust embeddings for speech processing. MeWEHV combines embeddings generated by a pre-En conjunto, esta tesis presenta varios avances en las áreas de identificación de hablantes, idiomas y acentos, y propone nuevas técnicas y modelos que utilizan el aprendizaje por transferencia para mejorar el rendimiento de los modelos del estado del arte evaluados.