Houjun Huang scite author profile

Recently, speaker embeddings extracted from a speaker discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, the DNN speaker classifier is trained using cross entropy loss with softmax. However, this kind of loss function does not explicitly encourage inter-class separability and intraclass compactness. As a result, the embeddings are not optimal for speaker recognition tasks. In this paper, to address this issue, three different margin based losses which not only separate classes but also demand a fixed margin between classes are introduced to deep speaker embedding learning. It could be demonstrated that the margin is the key to obtain more discriminative speaker embeddings. Experiments are conducted on two public text independent tasks: VoxCeleb1 and Speaker in The Wild (SITW). The proposed approach can achieve the state-ofthe-art performance, with 25% ∼ 30% equal error rate (EER) reduction on both tasks when compared to strong baselines using cross entropy loss with softmax, obtaining 2.238% EER on VoxCeleb1 test set and 2.761% EER on SITW core-core test set, respectively. Index Terms: speaker recognition, speaker embeddings, angular softmax, additive margin softmax, additive angular margin loss

show abstract

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Xiang¹,

Wang²,

Huang³

et al. 2019

Preprint

View full text Add to dashboard Cite

DeepVein: Novel finger vein verification methods based on Deep Convolutional Neural Networks

Huang

Liu

et al. 2017

View full text Add to dashboard Cite

AISpeech-SJTU Accent Identification System for the Accented English Speech Recognition Challenge

Huang

Xiang²,

Yang

et al. 2021

View full text Add to dashboard Cite

This paper describes the AISpeech-SJTU system for the accent identification track of the Interspeech-2020 Accented English Speech Recognition Challenge. In this challenge track, only 160-hour accented English data collected from 8 countries and the auxiliary Librispeech dataset are provided for training. To build an accurate and robust accent identification system, we explore the whole system pipeline in detail. First, we introduce the ASR based phone posteriorgram (PPG) feature to accent identification and verify its efficacy. Then, a novel TTS based approach is carefully designed to augment the very limited accent training data for the first time. Finally, we propose the test time augmentation and embedding fusion schemes to further improve the system performance. Our final system is ranked first in the challenge and outperforms all the other participants by a large margin. The submitted system achieves 83.63% average accuracy on the challenge evaluation data, ahead of the others by more than 10% in absolute terms.

show abstract

AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge

Huang¹,

Xiang²,

Yang

et al. 2021

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Houjun Huang

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

DeepVein: Novel finger vein verification methods based on Deep Convolutional Neural Networks

AISpeech-SJTU Accent Identification System for the Accented English Speech Recognition Challenge

AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge

Contact Info

Product

Resources

About