In this paper, we investigate two neural architecture for gender detection and speaker identification tasks by utilizing Mel-frequency cepstral coefficients (MFCC) features which do not cover the voice related characteristics. One of our goals is to compare different neural architectures, multi-layers perceptron (MLP) and, convolutional neural networks (CNNs) for both tasks with various settings and learn the gender/speaker-specific features automatically. The experimental results reveal that the models using z-score and Gramian matrix transformation obtain better results than the models only use max-min normalization of MFCC. In terms of training time, MLP requires large training epochs to converge than CNN. Other experimental results show that MLPs outperform CNNs for both tasks in terms of generalization errors.
This article discusses the classification algorithms for the problem of personality identification by voice using machine learning methods. We used the MFCC algorithm in the speech preprocessing process. To solve the problem, a comparative analysis of five classification algorithms was carried out. In the first experiment, the support vector method was determined-0.90 and multilayer perceptron-0.83, that showed the best results. In the second experiment, a multilayer perceptron with an accuracy of 0.93 was proposed using the Robust scaler method for personal identification. Therefore, to solve this problem, it is possible to use a multi-layer perceptron, taking into account the specifics of the speech signal.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.