2020
DOI: 10.1007/s10772-020-09771-2
|View full text |Cite
|
Sign up to set email alerts
|

Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
9
1

Relationship

1
9

Authors

Journals

citations
Cited by 23 publications
(6 citation statements)
references
References 43 publications
0
6
0
Order By: Relevance
“…Multiple machine learning-based classifiers, including the GMM, hidden Markov model (HMM) [21], multilayer perceptron (MLP), k-nearest neighbor (k-NN) [22], support vector machine (SVM) [23], and random forest (RF), have been used by many researchers to identify speakers from audio data signals. These classifiers have been extensively used in speech-related applications, including automatic speaker identification and emotion recognition.…”
Section: Classification Methodsmentioning
confidence: 99%
“…Multiple machine learning-based classifiers, including the GMM, hidden Markov model (HMM) [21], multilayer perceptron (MLP), k-nearest neighbor (k-NN) [22], support vector machine (SVM) [23], and random forest (RF), have been used by many researchers to identify speakers from audio data signals. These classifiers have been extensively used in speech-related applications, including automatic speaker identification and emotion recognition.…”
Section: Classification Methodsmentioning
confidence: 99%
“…The block diagram for CNN architecture has be explained as in figure 3. Using differential features, along with high level convolution features contain sufficient speaker information and yielded better results for ASR [51] The convolution layers consist of a set of filters or kernels which moves across the input image information in a specified manner to perform convolution. The computation for a convolution layer in the lth layer is given in equation ( 9).…”
Section: Cnn For Feature Extractionmentioning
confidence: 99%
“…But only ten speakers were used for evaluation, and each utterance contained only one word. Nainan Kulkarni et al [20] evaluated 1D CNN, SVM and GMM based on dynamic MFCC features. The 1D CNN-based model achieved a validation accuracy of about 73.25% on the VidTimit dataset.…”
Section: Literature Reviewmentioning
confidence: 99%