2017
DOI: 10.1109/taslp.2017.2661705
|View full text |Cite
|
Sign up to set email alerts
|

Deep Learning Backend for Single and Multisession i-Vector Speaker Recognition

Abstract: The promising performance of Deep Learning (DL) in speech recognition has motivated the use of DL in other speech technology applications such as speaker recognition. Given ivectors as inputs, the authors proposed an impostor selection algorithm and a universal model adaptation process in a hybrid system based on Deep Belief Networks (DBN) and Deep Neural Networks (DNN) to discriminatively model each target speaker. In order to have more insight into the behavior of DL techniques in both single and multi-sessi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 55 publications
(31 citation statements)
references
References 42 publications
(69 reference statements)
0
31
0
Order By: Relevance
“…All the speech segments are chopped into 2 sec segments and converted to 300 dimensional speaker vectors. Afterwards, the resulting speaker vectors are clustered using a two stage unsupervised clustering technique which was used to estimate the speaker labels of the background data for training the Probabilistic Linear Discriminant Analysis (PLDA) in [4]. The first stage of the clustering algorithm is similar to the Mean Shift based algorithm proposed in [5] and used successfully in [6].…”
Section: Scoringmentioning
confidence: 99%
“…All the speech segments are chopped into 2 sec segments and converted to 300 dimensional speaker vectors. Afterwards, the resulting speaker vectors are clustered using a two stage unsupervised clustering technique which was used to estimate the speaker labels of the background data for training the Probabilistic Linear Discriminant Analysis (PLDA) in [4]. The first stage of the clustering algorithm is similar to the Mean Shift based algorithm proposed in [5] and used successfully in [6].…”
Section: Scoringmentioning
confidence: 99%
“…The main research trend consists of designing biometric recognition methods that are robust to poor-quality signals, and the research community is mainly focused on DL techniques, which learn the discriminative representation of an individual directly from the raw input signal [18].…”
Section: Voicementioning
confidence: 99%
“…Recently, biometric systems based on DL techniques and CNNs have been gaining popularity and have achieved accuracy improvements for face, fingerprint [30], iris [6], palm [49], ECG [44], voice [18], and gait [50] recognition, as well as for age and gender estimation. DL techniques are also being used in multibiometric systems to increase accuracy [2] or to learn multiple representations from the same biometric sample [22].…”
Section: Accuracy and Execution Timementioning
confidence: 99%
“…As an example of frontend, in [22,25], a vector representation of speakers was proposed by means of RBM adaptation. As a backend, in [26], various imposter selection algorithms are proposed, in order to reduce the performance gap between cosine and PLDA scoring techniques without the use of labeled data. They applied DBN adaptation as a backend for i-vector based speaker verification.…”
Section: Introductionmentioning
confidence: 99%