2014
DOI: 10.21236/ada613971
|View full text |Cite
|
Sign up to set email alerts
|

A Novel Scheme for Speaker Recognition Using a Phonetically-Aware Deep Neural Network

Abstract: We propose a novel framework for speaker recognition in which extraction of sufficient statistics for the state-of-the-art i-vector model is driven by a deep neural network (DNN) trained for automatic speech recognition (ASR). Specifically, the DNN replaces the standard Gaussian mixture model (GMM) to produce frame alignments. The use of an ASR-DNN system in the speaker recognition pipeline is attractive as it integrates the information from speech content directly into the statistics, allowing the standard ba… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

4
296
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 289 publications
(300 citation statements)
references
References 11 publications
4
296
0
Order By: Relevance
“…The focus of the speaker recognition community is mainly concerned with the monaural or single-channel case as speaker recognition has traditionally been applied to telephone speech. Major advances on increasing the robustness of speaker recognition include the introduction of the i-vector framework based on Gaussian mixture models (GMM/i-vector) [1], the Probabilistic Linear Discriminant Analysis (PLDA) back-end [2], Deep Neural Networks (DNNs) replacing the GMM component [3], and recently introduced x-vectors [4] for speaker embedding. X-vectors are of particular interest due to their use of inexpensive data augmentation for increasing robustness and utilization of the algorithms associated with i-vectors, e. g. PLDA for scoring.…”
Section: Introductionmentioning
confidence: 99%
“…The focus of the speaker recognition community is mainly concerned with the monaural or single-channel case as speaker recognition has traditionally been applied to telephone speech. Major advances on increasing the robustness of speaker recognition include the introduction of the i-vector framework based on Gaussian mixture models (GMM/i-vector) [1], the Probabilistic Linear Discriminant Analysis (PLDA) back-end [2], Deep Neural Networks (DNNs) replacing the GMM component [3], and recently introduced x-vectors [4] for speaker embedding. X-vectors are of particular interest due to their use of inexpensive data augmentation for increasing robustness and utilization of the algorithms associated with i-vectors, e. g. PLDA for scoring.…”
Section: Introductionmentioning
confidence: 99%
“…This method is for classification only. The deep neural network is a standard feed‐forward neural network that is both much larger and much deeper than traditional neural networks (Lei, Scheffer, Ferrer, & Mclaren, ) and it has recently led to significant improvement in countless areas of machine learning (Gu & Rigazio, ).…”
Section: Methodsmentioning
confidence: 99%
“…It is written as = + where μ is mean super-vector, is low rank total variability matrix and is a low rank vector referred to as i-vector. PLDA model is then employed to generate verification scores by comparing i-vectors from different utterances [7]. Any model that can provide posteriors of K-classes per frame other than GMM can be used for i-vector calculation.…”
Section: A Gmm-ubm and I-vector Based Systemmentioning
confidence: 99%