2021
DOI: 10.3390/s21155097
|View full text |Cite
|
Sign up to set email alerts
|

A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation

Abstract: We present a new architecture to address the challenges of speaker identification that arise in interaction of humans with social robots. Though deep learning systems have led to impressive performance in many speech applications, limited speech data at training stage and short utterances with background noise at test stage present challenges and are still open problems as no optimum solution has been reported to date. The proposed design employs a generative model namely the Gaussian mixture model (GMM) and a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(17 citation statements)
references
References 62 publications
0
12
0
Order By: Relevance
“…Feature extraction is accomplished by changing the speech waveform to a form of parametric representation at a relatively lesser data rate for subsequent processing and analysis [ 11 , 12 , 13 , 14 ]. Feature extraction approaches usually yield a multidimensional feature vector for every speech signal.…”
Section: Related Workmentioning
confidence: 99%
“…Feature extraction is accomplished by changing the speech waveform to a form of parametric representation at a relatively lesser data rate for subsequent processing and analysis [ 11 , 12 , 13 , 14 ]. Feature extraction approaches usually yield a multidimensional feature vector for every speech signal.…”
Section: Related Workmentioning
confidence: 99%
“…It does not correspond linearly to the physical frequency of the tone, as the human auditory system apparently does not perceive pitch linearly. The Mel Filter is approximately a linear frequency spacing below 1 kHz and a logarithmic spacing above 1 kHz [4].…”
Section: Windowingmentioning
confidence: 99%
“…In addition to fusion of feature extraction techniques, fusion of different types of classi ers are applied to improve the performance of speaker recognition. In [4] fusion of GMM and SVM is used to develop a speaker recognition system with MFCC and GFCC feature extraction techniques. In the study [9] speaker veri cation is developed by using fusion of GMM and ANN models with GFCC features.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Some well-designed metric learning losses have been exploited to train the entire SV system in an end-to-end fashion, such as triplet loss [ 9 , 11 ], generalized end-to-end (GE2E) loss [ 18 ], and cluster-range loss [ 4 ]. Besides, many studies on robust features [ 20 , 21 ] and hybrid models [ 21 , 22 ] have been conducted to further improve the performance of traditional and DNN-based speaker recognition systems. In recent years, CNN has drawn much attention in this research field.…”
Section: Introductionmentioning
confidence: 99%