Miquel India scite author profile

Miquel India

5Publications

89Citation Statements Received

78Citation Statements Given

How they've been cited

151

How they cite others

Affiliations

Universitat Politècnica de Catalunya

Publications

Order By: Most citations

Self Multi-Head Attention for Speaker Recognition

India¹,

Safari²,

Hernando³

2019

View full text Add to dashboard Cite

Most state-of-the-art Deep Learning (DL) approaches for speaker recognition work on a short utterance level. Given the speech signal, these algorithms extract a sequence of speaker embeddings from short segments and those are averaged to obtain an utterance level speaker representation. In this work we propose the use of an attention mechanism to obtain a discriminative speaker embedding given non fixed length speech utterances. Our system is based on a Convolutional Neural Network (CNN) that encodes short-term speaker features from the spectrogram and a self multi-head attention model that maps these representations into a long-term speaker embedding. The attention model that we propose produces multiple alignments from different subsegments of the CNN encoded states over the sequence. Hence this mechanism works as a pooling layer which decides the most discriminative features over the sequence to obtain an utterance level representation. We have tested this approach for the verification task for the VoxCeleb1 dataset. The results show that self multi-head attention outperforms both temporal and statistical pooling methods with a 18% of relative EER. Obtained results show a 58% relative improvement in EER compared to i-vector+PLDA.

show abstract

Self-Attention Encoding and Pooling for Speaker Recognition

Safari

India

Hernando

2020

View full text Add to dashboard Cite

I-Vector Transformation Using K-Nearest Neighbors for Speaker Verification

Khan

India

Hernando

2020

View full text Add to dashboard Cite

Self Multi-Head Attention for Speaker Recognition

India¹,

Safari²,

Hernando³

2019

Preprint

View full text Add to dashboard Cite

Double Multi-Head Attention for Speaker Verification

India

Safari

Hernando

2021

View full text Add to dashboard Cite

Most state-of-the-art Deep Learning systems for text-independent speaker verification are based on speaker embedding extractors. These architectures are commonly composed of a feature extractor front-end together with a pooling layer to encode variable-length utterances into fixed-length speaker vectors. In this paper we present Double Multi-Head Attention (MHA) pooling, which extends our previous approach based on Self MHA. An additional self attention layer is added to the pooling layer that summarizes the context vectors produced by MHA into a unique speaker representation. This method enhances the pooling mechanism by giving weights to the information captured for each head and it results in creating more discriminative speaker embeddings. We have evaluated our approach with the VoxCeleb2 dataset. Our results show 6.09% and 5.23% relative improvement in terms of EER compared to Self Attention pooling and Self MHA, respectively. According to the obtained results, Double MHA has shown to be an excellent approach to efficiently select the most relevant features captured by the CNN-based front-ends from the speech signal.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Miquel India

Self Multi-Head Attention for Speaker Recognition

Self-Attention Encoding and Pooling for Speaker Recognition

I-Vector Transformation Using K-Nearest Neighbors for Speaker Verification

Self Multi-Head Attention for Speaker Recognition

Double Multi-Head Attention for Speaker Verification

Contact Info

Product

Resources

About