2020
DOI: 10.1080/23311916.2020.1727168
|View full text |Cite
|
Sign up to set email alerts
|

Neural architectures for gender detection and speaker identification

Abstract: In this paper, we investigate two neural architecture for gender detection and speaker identification tasks by utilizing Mel-frequency cepstral coefficients (MFCC) features which do not cover the voice related characteristics. One of our goals is to compare different neural architectures, multi-layers perceptron (MLP) and, convolutional neural networks (CNNs) for both tasks with various settings and learn the gender/speaker-specific features automatically. The experimental results reveal that the models using … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4
1

Relationship

2
8

Authors

Journals

citations
Cited by 20 publications
(11 citation statements)
references
References 20 publications
(18 reference statements)
0
9
0
2
Order By: Relevance
“…For a perfect approximation to the minimum error of the neural network, the learning rate should tend to an infinitesimal value to ensure the best convergence of the learning algorithm. However, the smaller the selected value of the learning step, the longer the learning takes place online [9].…”
Section: Training a Neural Network By Back Propagation Of An Errormentioning
confidence: 99%
“…For a perfect approximation to the minimum error of the neural network, the learning rate should tend to an infinitesimal value to ensure the best convergence of the learning algorithm. However, the smaller the selected value of the learning step, the longer the learning takes place online [9].…”
Section: Training a Neural Network By Back Propagation Of An Errormentioning
confidence: 99%
“…By trying to simulate objects in the time domain using RNN or convolutional neural networks (CNN) instead of HMM, he encounters the problem of data alignment. The loss functions of both RNN and CNN (Convolutional Neural Networks) are determined at each point in the sequence, therefore, to provide training opportunities, need to know the alignment relationship between the output RNN sequence and the target sequence [13].…”
Section: End-to-end Model Based On Connectionist Temporal Classificationmentioning
confidence: 99%
“…The neural network used in this work is a fully connected neural network with a straightened linear unit [14] as an activation function. A straightened linear unit is essentially a piecewise function that turns all negative values into 0, while positive values remain unchanged.…”
Section: Fully Connected Neural Network Modelingmentioning
confidence: 99%