Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

Zhang, Zhaofeng; Wang, Longbiao; Kai, Atsuhiko; Yamada, Takanori; Li, Weifeng; Iwahashi, Masahiro

doi:10.1186/s13636-015-0056-7

Cited by 52 publications

(20 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Accordingly, accuracy levels of proposed deep neural networks (DNN) for speaker recognition (both verification and identification) are far surpassing previous state-of-the-art techniques. Recent examples include the use of embeddings obtained from convolutional neural networks (CNN) for speaker recognition in [1,2,3], the use of auto-encoder models for speaker identification in [4,5], and a number of cases utilizing ResNet for both speaker recognition and identification in [6,7].…”

Section: Introductionmentioning

confidence: 99%

A Deep Neural Network for Short-Segment Speaker Recognition

Hajavi¹,

Etemad²

2019

Interspeech 2019

View full text Add to dashboard Cite

Today's interactive devices such as smart-phone assistants and smart speakers often deal with short-duration speech segments. As a result, speaker recognition systems integrated into such devices will be much better suited with models capable of performing the recognition task with short-duration utterances. In this paper, a new deep neural network, UtterIdNet, capable of performing speaker recognition with short speech segments is proposed. Our proposed model utilizes a novel architecture that makes it suitable for short-segment speaker recognition through an efficiently increased use of information in short speech segments. UtterIdNet has been trained and tested on the VoxCeleb datasets, the latest benchmarks in speaker recognition. Evaluations for different segment durations show consistent and stable performance for short segments, with significant improvement over the previous models for segments of 2 seconds, 1 second, and especially sub-second durations (250 ms and 500 ms).

show abstract

Section: Introductionmentioning

confidence: 99%

A Deep Neural Network for Short-Segment Speaker Recognition

Hajavi¹,

Etemad²

2019

Interspeech 2019

View full text Add to dashboard Cite

show abstract

“…In the future, we try to apply dereverberation methods [22,28,29,31] for distant-talking accent recognition and evaluate our proposed method on real-word distant-talking speech data.…”

Section: Discussionmentioning

confidence: 98%

Distant-talking accent recognition by combining GMM and DNN

Phapatanaburi

Wang

Ryota

et al. 2015

Multimed Tools Appl

Self Cite

View full text Add to dashboard Cite

Recently, automatic accent recognition has been paid more and more attentions. However, there are few researches focusing on accent recognition in distant-talking environment which is very important for improving distant-talking speech recognition performance with non-native accents. In this paper, we apply Gaussian Mixture Models (GMM) and Deep Neural Network (DNN) to identify the speaker accent in reverberant environments. The combination of likelihood with these two approaches is also proposed. In reverberant environment, the accent recognition rate was improved from 90.7 % with GMM to 93.0 % with DNN. The combination of GMM and DNN achieved recognition rate of 97.5 %, which outperformed than the individual GMM and DNN because the complementation of GMM and DNN. The relative error reduction is 73.1 % than the GMM-based method and 64.3 % than the DNN-based method, respectively.

show abstract

“…Unlike in the speech recognition tasks where the DNNs are used to get enhanced features from noisy features, researchers more prefer to use a DNN or convolutional neural network (CNN) to generate noise robustness bottleneck feature directly in speaker verification tasks [185][186][187]. As shown in Figure 11, acoustic features or feature maps are used to train a DNN/CNN with a bottleneck layer which has less nodes and closes to the output layer.…”

Section: Speech Recognition and Verification For The Internet Ofmentioning

confidence: 99%

A Survey on Machine Learning‐Based Mobile Big Data Analysis: Challenges and Applications

Xie

Song

et al. 2018

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

This paper attempts to identify the requirement and the development of machine learning-based mobile big data (MBD) analysis through discussing the insights of challenges in the mobile big data. Furthermore, it reviews the state-of-the-art applications of data analysis in the area of MBD. Firstly, we introduce the development of MBD. Secondly, the frequently applied data analysis methods are reviewed. Three typical applications of MBD analysis, namely, wireless channel modeling, human online and offline behavior analysis, and speech recognition in the Internet of Vehicles, are introduced, respectively. Finally, we summarize the main challenges and future development directions of mobile big data analysis.

show abstract

Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

Cited by 52 publications

References 49 publications

A Deep Neural Network for Short-Segment Speaker Recognition

A Deep Neural Network for Short-Segment Speaker Recognition

Distant-talking accent recognition by combining GMM and DNN

A Survey on Machine Learning‐Based Mobile Big Data Analysis: Challenges and Applications

Contact Info

Product

Resources

About