Optimisation of neural models for speaker identification

Oglesby, J.; Mason, John S.

doi:10.1109/icassp.1990.115617

Cited by 45 publications

(35 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The i-vector-PLDA technique and its variants have also been successfully used in text-dependent speaker recognition tasks [8,9,10]. In past studies, neural networks have been investigated for speaker recognition [11,12]. Being nonlinear classifiers, neural networks can discriminate the characteristics of different speakers.…”

Section: Previous Workmentioning

confidence: 99%

Deep neural networks for small footprint text-dependent speaker verification

Variani

Lei

McDermott

et al. 2014

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

926

580

View full text Add to dashboard Cite

In this paper we investigate the use of deep neural networks (DNNs) for a small footprint text-dependent speaker verification task. At development stage, a DNN is trained to classify speakers at the framelevel. During speaker enrollment, the trained DNN is used to extract speaker specific features from the last hidden layer. The average of these speaker features, or d-vector, is taken as the speaker model. At evaluation stage, a d-vector is extracted for each utterance and compared to the enrolled speaker model to make a verification decision. Experimental results show the DNN based speaker verification system achieves good performance compared to a popular i-vector system on a small footprint text-dependent speaker verification task. In addition, the DNN based system is more robust to additive noise and outperforms the i-vector system at low False Rejection operating points. Finally the combined system outperforms the i-vector system by 14% and 25% relative in equal error rate (EER) for clean and noisy conditions respectively.Index Terms-Deep neural networks, speaker verification.

show abstract

Section: Previous Workmentioning

confidence: 99%

Deep neural networks for small footprint text-dependent speaker verification

Variani

Lei

McDermott

et al. 2014

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

926

580

View full text Add to dashboard Cite

show abstract

“…In the speaker verification mode, the input vectors of the unknown user are fed forward through the network belonging to the claimed speaker. If the average output value is bigger than a threshold, the speaker is accepted (Oglesby and Mason, 1990). Rudasi and Zahorian (1991) demonstrated that by using small binary networks for distinguishing between two speakers instead of one large network with one output for each known speaker, the performance in speaker recognition was much better, since the binary networks were much more specialised.…”

Section: Gaussian Mixture Modelsmentioning

confidence: 99%

Fusing prosodic and acoustic information for speaker recognition

Farrús

2009

IJSLL

View full text Add to dashboard Cite

ADVERTIMENT. La consulta d'aquesta tesi queda condicionada a l'acceptació de les següents condicions d'ús: La difusió d'aquesta tesi per mitjà del servei TDX (www.tesisenxarxa.net) ha estat autoritzada pels titulars dels drets de propietat intel·lectual únicament per a usos privats emmarcats en activitats d'investigació i docència. No s'autoritza la seva reproducció amb finalitats de lucre ni la seva difusió i posada a disposició des d'un lloc aliè al servei TDX. No s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant al resum de presentació de la tesi com als seus continguts. En la utilització o cita de parts de la tesi és obligat indicar el nom de la persona autora.ADVERTENCIA. La consulta de esta tesis queda condicionada a la aceptación de las siguientes condiciones de uso: La difusión de esta tesis por medio del servicio TDR (www.tesisenred.net) ha sido autorizada por los titulares de los derechos de propiedad intelectual únicamente para usos privados enmarcados en actividades de investigación y docencia. No se autoriza su reproducción con finalidades de lucro ni su difusión y puesta a disposición desde un sitio ajeno al servicio TDR. No se autoriza la presentación de su contenido en una ventana o marco ajeno a TDR (framing). Esta reserva de derechos afecta tanto al resumen de presentación de la tesis como a sus contenidos. En la utilización o cita de partes de la tesis es obligado indicar el nombre de la persona autora. WARNING.On having consulted this thesis you're accepting the following use conditions: Spreading this thesis by the TDX (www.tesisenxarxa.net) service has been authorized by the titular of the intellectual property rights only for private uses placed in investigation and teaching activities. Reproduction with lucrative aims is not authorized neither its spreading and availability from a site foreign to the TDX service. Introducing its content in a window or frame foreign to the TDX service is not authorized (framing). This rights affect to the presentation summary of the thesis as well as to its contents. In the using or citation of parts of the thesis it's obliged to indicate the name of the author PhD Dissertation FUSING PROSODIC AND ACOUSTIC INFORMATION FOR SPEAKER RECOGNITION Mireia Farrús i Cabeceran AbstractAutomatic speaker recognition is the use of a machine to identify an individual from a spoken sentence. Recently, this technology has been undergone an increasing use in applications such as access control, transaction authentication, law enforcement, forensics, and system customisation, among others.One of the central questions addressed by this field is what is it in the speech signal that conveys speaker identity. Traditionally, automatic speaker recognition systems have relied mostly on short-term features related to the spectrum of the voice. However, human speaker recognition relies on other sources of information; therefore, there is reason to believe that these sources can play also an important role ...

show abstract

“…Theoretically, any multicategory classification task can be decomposed into a set of binary classification subtasks, where each subtask is to discriminate between the data belonging to a specific class and all the others. By this fact, some connectionist methods have been proposed by constructing a set of neural networks with binary outputs for speaker identification [13], [21]. Indeed, those neural networks of binary outputs may work in a parallel way, which speeds up training.…”

Section: Introductionmentioning

confidence: 99%

“…In previous studies, connectionist approaches have been applied to build speaker identification systems [2], [3], [7], [9], [13], [21] where a neural-network model is typically used to characterize all the speakers' voice in a given set. In this circumstance, the input space of a neural network is composed of feature vectors extracted from acoustic signals belonging to all the 1045-9227/02$17.00 © 2002 IEEE speakers, while the outputs are usually labels of corresponding speaker identities.…”

Section: Introductionmentioning

confidence: 99%

Capture interspeaker information with a neural network for speaker identification

Wang

Chen²,

Chi

2002

IEEE Trans. Neural Netw.

View full text Add to dashboard Cite

Abstract-Model-based approach is one of methods widely used for speaker identification, where a statistical model is used to characterize a specific speaker's voice but no interspeaker information is involved in its parameter estimation. It is observed that interspeaker information is very helpful in discriminating between different speakers. In this paper, we propose a novel method for the use of interspeaker information to improve performance of a model-based speaker identification system. A neural network is employed to capture the interspeaker information from the output space of those statistical models. In order to sufficiently utilize interspeaker information, a rival penalized encoding rule is proposed to design supervised learning pairs. For better generalization, moreover, a query-based learning algorithm is presented to actively select the input data of interest during training of the neural network. Comparative results on the KING speech corpus show that our method leads to a considerable improvement for a model-based speaker identification system. Index Terms-Interspeaker information, KING speech corpus, model-based method, neural networks, query-based learning algorithm, rival penalized encoding scheme, speaker identification.

show abstract

Optimisation of neural models for speaker identification

Cited by 45 publications

References 8 publications

Deep neural networks for small footprint text-dependent speaker verification

Deep neural networks for small footprint text-dependent speaker verification

Fusing prosodic and acoustic information for speaker recognition

Capture interspeaker information with a neural network for speaker identification

Contact Info

Product

Resources

About