A novel scheme for speaker recognition using a phonetically-aware deep neural network

Lei, Yun; Scheffer, Nicolas; Ferrer, Luciana; McLaren, Mitchell

doi:10.1109/icassp.2014.6853887

Cited by 367 publications

(166 citation statements)

References 9 publications

Supporting

Mentioning

162

Contrasting

Unclassified

Order By: Relevance

“…These are used to compute SS using the feature vectors of an utterance. This approach achieved significant improvements over a baseline i-vector system (Lei et al, 2014).…”

Section: Dnn Based Systemmentioning

confidence: 93%

“…Training a PLDA model for the SV task uses speaker labels to define a set of classes to be discriminated. It is common to have multiple instances of speaker labelled i-vectors available for large text-independent datasets (Romero and McCree, 2014;Lei et al, 2014). For a text-dependent scenario, the outcome of the task is linked to identifying content and speaker.…”

Section: Plda Projection Featuresmentioning

confidence: 99%

“…In the past, several studies have suggested that integrating linguistic information into speaker recognition systems can be useful (Lei et al, 2014;Park and Hazen, 2002;Sturim et al, 2002;. In HMM/DNN automatic speech recognition (Lei et al, 2014), state posterior probabilities are obtained at the output of the DNN acoustic model. These are used to compute SS using the feature vectors of an utterance.…”

Section: Dnn Based Systemmentioning

confidence: 99%

“…Recent work uses Deep Neural Network (DNN) based Sufficient Statistics (SS) to compute i-vectors (Lei et al, 2014). Unlike conventional GMM-UBM, DNNs are trained in a supervised manner using phonetic classes obtained after forced alignment of the training data usually with Hidden Markov Model (HMM)/GMM acoustic models of Automatic Speech Recognition (ASR) system.…”

Section: Content or Linguistic Information Is Relevant To Text-dependmentioning

confidence: 99%

See 3 more Smart Citations

Template-matching for text-dependent speaker verification

Dey

Motlíček

Madikeri

et al. 2017

Speech Communication

View full text Add to dashboard Cite

show abstract

“…These are used to compute SS using the feature vectors of an utterance. This approach achieved significant improvements over a baseline i-vector system (Lei et al, 2014).…”

Section: Dnn Based Systemmentioning

confidence: 93%

Section: Plda Projection Featuresmentioning

confidence: 99%

Section: Dnn Based Systemmentioning

confidence: 99%

Section: Content or Linguistic Information Is Relevant To Text-dependmentioning

confidence: 99%

See 2 more Smart Citations

Template-matching for text-dependent speaker verification

Dey

Motlíček

Madikeri

et al. 2017

Speech Communication

View full text Add to dashboard Cite

show abstract

“…The i-vector model plus various normalization approaches offers the standard framework for modern speaker verification systems [1], [2], [3], [4]. Basically, the i-vector model uses a Gaussian mixture model (GMM) or a deep neural network (DNN) to collect the Baum-Welch sufficient statistics of an utterance, and then projects it onto a low-dimensional total variability space.…”

Section: Introductionmentioning

confidence: 99%

Local training for PLDA in speaker verification

Zhao¹,

Wang

et al. 2016

2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases And

View full text Add to dashboard Cite

Abstract-PLDA is a popular normalization approach for the i-vector model, and it has delivered state-of-the-art performance in speaker verification. However, PLDA training requires a large amount of labeled development data, which is highly expensive in most cases. A possible approach to mitigate the problem is various unsupervised adaptation methods, which use unlabeled data to adapt the PLDA scattering matrices to the target domain.In this paper, we present a new 'local training' approach that utilizes inaccurate but much cheaper local labels to train the PLDA model. These local labels discriminate speakers within a single conversion only, and so are much easier to obtain compared to the normal 'global labels'. Our experiments show that the proposed approach can deliver significant performance improvement, particularly with limited globally-labeled data.

show abstract

The deep multichannel discrete‐time cellular neural network model for classification

Abtioglu

Yalçın

2022

Circuit Theory & Apps

View full text Add to dashboard Cite

Summary High latency and power consumption are two major problems that need to be addressed in convolutional neural networks (CNN). In this paper, the convolutional layer is replaced with a discrete‐time cellular neural network (CellNN) to overcome these problems. Multiple configurations of CellNNs are trained in a framework called TensorFlow to classify objects from the CIFAR‐10 database. Effects of the number of iterations, the number of channels, batch normalization, and activation functions on the classification accuracies are presented. It is shown that TensorFlow is a tool that is capable of training discrete‐time CellNNs. Although the accuracies of the proposed networks on CIFAR‐10 are slightly lesser than the existing CNNs, with reduced parameters and multiply‐accumulates (MACs), power consumption and computation time of our networks will be less than CNNs.

show abstract

A novel scheme for speaker recognition using a phonetically-aware deep neural network

Cited by 367 publications

References 9 publications

Template-matching for text-dependent speaker verification

Template-matching for text-dependent speaker verification

Local training for PLDA in speaker verification

The deep multichannel discrete‐time cellular neural network model for classification

Contact Info

Product

Resources

About