Houjun Huang scite author profile

Recently, speaker embeddings extracted from a speaker discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, the DNN speaker classifier is trained using cross entropy loss with softmax. However, this kind of loss function does not explicitly encourage inter-class separability and intraclass compactness. As a result, the embeddings are not optimal for speaker recognition tasks. In this paper, to address this issue, three different margin based losses which not only separate classes but also demand a fixed margin between classes are introduced to deep speaker embedding learning. It could be demonstrated that the margin is the key to obtain more discriminative speaker embeddings. Experiments are conducted on two public text independent tasks: VoxCeleb1 and Speaker in The Wild (SITW). The proposed approach can achieve the state-ofthe-art performance, with 25% ∼ 30% equal error rate (EER) reduction on both tasks when compared to strong baselines using cross entropy loss with softmax, obtaining 2.238% EER on VoxCeleb1 test set and 2.761% EER on SITW core-core test set, respectively. Index Terms: speaker recognition, speaker embeddings, angular softmax, additive margin softmax, additive angular margin loss

show abstract

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Xiang¹,

Wang²,

Huang³

et al. 2019

Preprint

View full text Add to dashboard Cite

DeepVein: Novel finger vein verification methods based on Deep Convolutional Neural Networks

Huang

Liu

et al. 2017

View full text Add to dashboard Cite

AISpeech-SJTU Accent Identification System for the Accented English Speech Recognition Challenge

Huang

Xiang²,

Yang

et al. 2021

View full text Add to dashboard Cite

This paper describes the AISpeech-SJTU system for the accent identification track of the Interspeech-2020 Accented English Speech Recognition Challenge. In this challenge track, only 160-hour accented English data collected from 8 countries and the auxiliary Librispeech dataset are provided for training. To build an accurate and robust accent identification system, we explore the whole system pipeline in detail. First, we introduce the ASR based phone posteriorgram (PPG) feature to accent identification and verify its efficacy. Then, a novel TTS based approach is carefully designed to augment the very limited accent training data for the first time. Finally, we propose the test time augmentation and embedding fusion schemes to further improve the system performance. Our final system is ranked first in the challenge and outperforms all the other participants by a large margin. The submitted system achieves 83.63% average accuracy on the challenge evaluation data, ahead of the others by more than 10% in absolute terms.

show abstract

AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge

Huang¹,

Xiang²,

Yang

et al. 2021

Preprint

View full text Add to dashboard Cite

ICFVR 2017: 3rd international competition on finger vein recognition

Zhang

Huang

Zhang

et al. 2017

View full text Add to dashboard Cite

In recent years, finger vein recognition has become an important sub-field in biometrics and been applied to realworld applications. The development of finger vein recognition algorithms heavily depends on large-scale real-world data sets. In order to motivate research on finger vein recognition, we released the largest finger vein data set up to now and hold finger vein recognition competitions based on our data set every year. In 2017, International Competition on Finger Vein Recognition (ICFVR) is held jointly with IJCB 2017. 11 teams registered and 10 of them joined the final evaluation. The winner of this year dramatically improved the EER from 2.64% to 0.483% compared to the winner of last year. In this paper, we introduce the process and results of ICFVR 2017 and give insights on development of state-of-art finger vein recognition algorithms.

show abstract

Feature recovery for noise‐robust speaker verification

Huang

Zhou

et al. 2015

Electron. lett.

View full text Add to dashboard Cite

Noisy condition is an important extrinsic degradation affecting speaker verification system performance. A feature-recovery approach is proposed to eliminate noise-dependent variability in feature space. A frame of the noisy feature vector is recovered using the information of itself and the neighbour feature vectors. Experiments are conducted on noisy test sets for text-dependent speaker verification tasks and the results indicate that the system can achieve significant performance improvement by using recovered feature vectors.

show abstract

Voice biometrics using linear Gaussian model

Yang

Huang

et al. 2014

IET biom.

View full text Add to dashboard Cite

This study introduces a linear Gaussian model-based framework for voice biometrics. The model works with discretetime linear dynamical systems. The study motivation is to use the linear Gaussian modelling method in voice biometrics, and show that the accuracy offered by the linear Gaussian modelling method is comparable with other state-of-the-art methods such as Probabilistic Linear Discriminant Analysis and two-covariance model. An expectation-maximisation algorithm is derived to train the model and a Bayesian solution is used to calculate the log-likelihood ratio score of all trials of speakers. This approach performed well on the core-extended conditions of the NIST 2010 Speaker Recognition Evaluation, and is competitive compared with the Gaussian probabilistic linear discriminant analysis, in terms of normalised decision cost function.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Houjun Huang

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

DeepVein: Novel finger vein verification methods based on Deep Convolutional Neural Networks

AISpeech-SJTU Accent Identification System for the Accented English Speech Recognition Challenge

AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge

ICFVR 2017: 3rd international competition on finger vein recognition

Feature recovery for noise‐robust speaker verification

Voice biometrics using linear Gaussian model

Contact Info

Product

Resources

About