Distant-talking accent recognition by combining GMM and DNN

Phapatanaburi, Khomdet; Wang, Longbiao; Ryota, Sakagami; Zhang, Zhaofeng; Li, Ximin; Iwahashi, Masahiro

doi:10.1007/s11042-015-2935-4

Cited by 13 publications

(8 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additionally, LSTM was used in a hybrid emotion inference model that was proposed for inferring user emotion in a real-world voice-dialogue application, and a recurrent autoencoder was proposed to pre-train the LSTM to improve accuracy [32]. Further, GMM and DNNs were combined to identify distant accents in reverberant environments [26]. The authors found that this combination of classifiers outperformed the individual GMM and DNNs classifiers.…”

Section: Related Workmentioning

confidence: 99%

Spectrogram based multi-task audio classification

Zeng

Mao

Peng

et al. 2017

Multimed Tools Appl

135

View full text Add to dashboard Cite

Audio classification is regarded as a great challenge in pattern recognition. Although audio classification tasks are always treated as independent tasks, tasks are essentially related to each other such as speakers' accent and speakers' identification. In this paper, we propose a Deep Neural Network (DNN)-based multi-task model that exploits such relationships and deals with multiple audio classification tasks simultaneously. We term our model as the gated Residual Networks (GResNets) model since it integrates Deep Residual Networks (ResNets) with a gate mechanism, which extract better representations between tasks compared with Convolutional Neural Networks (CNNs). Specifically, two multiplied convolutional layers are used to replace two feed-forward convolution layers in the ResNets. We tested our model on multiple audio classification tasks and found that our multi-task model achieves higher accuracy than task-specific models which train the models separately.

show abstract

Section: Related Workmentioning

confidence: 99%

Spectrogram based multi-task audio classification

Zeng

Mao

Peng

et al. 2017

Multimed Tools Appl

135

View full text Add to dashboard Cite

show abstract

“…Although MP-aware DNN-based detection may provide a better performance than that of conventional DNN-based detection using only magnitude information, it may not work well due to the lack of feature-based resolution within low frequencies [17] and limited training data. In [18], a combination of GMM and DNN for distant-talking accent recognition was proposed to increase the accent recognition performance on limited training data. The result showed that the combination of these two different methods could improve the distant-accent recognition performance when compared to just an individual one.…”

Section: Proposed Combination Of Gmm and Mp-aware Dnnmentioning

confidence: 99%

“…The result showed that the combination of these two different methods could improve the distant-accent recognition performance when compared to just an individual one. Motivated by [18], a combination of CQCC-based GMM and MP-aware DNN was proposed to take advantage of these benefits of different classifications and features and was expected to achieve better performance. The probabilities obtained from the different systems are combined by the following equation.…”

Section: Proposed Combination Of Gmm and Mp-aware Dnnmentioning

confidence: 99%

“…The applied MP-aware DNN-based detection was compared with conventional DNN-based detection and baseline constant Q transform cepstral coefficients-based Gaussian mixture classification (CQCC-based GMM). Although the MP-aware DNN-based method may provide better discrimination detection than conventional DNN-based detection using only magnitude/phase information, it may not work well due to the lack of feature-based reso-lution within low frequencies [17] and limited training data [18]. To address this problem, a novel method was proposed; combining MP-aware DNN with CQCC-based GMM to improve the reliable detection decision.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Exploiting Magnitude and Phase Aware Deep Neural Network for Replay Attack Detection

Phapatanaburi¹,

Buayai

Naktong

et al. 2020

ECTI-EEC

Self Cite

View full text Add to dashboard Cite

Magnitude and phase aware deep neural network (MP aware DNN) based on Fast Fourier Transform information, has recently been received more attention to many speech applications. However, little attention has been paid to its aspect in terms of replay attack detection developed for the automatic speaker verification and countermeasures (ASVspoof 2017). This paper aims to investigate the MP aware DNN as a speech classification for detecting non-replayed (genuine) and replayed speech. Also, to exploit the advantage of the classifier-based complementary to improve the reliable detection decision, we propose a novel method by combining MP aware DNN with standard replay attack detection (that is, the use of constant Q transform cepstral coefficients-based Gaussian mixture model classification: CQCC-based GMM). Experiments are evaluated using ASVspoof 2017 and a standard measure of detection performance called equal error rate (EER). The results showed that MP aware DNN -based detection performed conventional DNN method using only the magnitude/phase features. Moreover, we found that score combination of CQCC-based GMM with MP aware DNN achieved additional improvement, indicating that MP aware DNN is very useful, especially when combined with the CQCC-based GMM for replay attack detection.

show abstract

“…In this special issue, Ren et al [10] propose three integration schemes for robust distant-talking speech recognition which combine bottleneck feature extraction with dereverberation technique. As an accompanying paper by the same institution, Phapatanaburi et al [9] propose a combination of Gaussian Mixture Models (GMM) and Deep Neural Networks (DNNs) to identify the speaker accent in reverberant environments.…”

Section: Recognizing Humans and Understanding Their Behaviorsmentioning

confidence: 99%