2016
DOI: 10.1007/s00521-016-2501-7
|View full text |Cite
|
Sign up to set email alerts
|

Speaker recognition with hybrid features from a deep belief network

Abstract: This is the accepted version of the paper.This version of the publication may differ from the final published version. Abstract Learning representation from audio data has shown advantages over the hand-crafted features such as Mel Frequency Cepstral Coefficients (MFCC) in many audio applications. In most of the representation learning approaches, the connectionist systems have been used to learn and extract latent features from the fixed length data. In this paper, we propose an approach to combine the learne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
41
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
4

Relationship

2
6

Authors

Journals

citations
Cited by 86 publications
(42 citation statements)
references
References 17 publications
(13 reference statements)
0
41
0
Order By: Relevance
“…The robot presented in this work accounts for obstacles only. An interesting future work is to make the robot more intelligent by using deep learning approaches such as those in [18], [19], [20], [21]. In this way, the robot can be trained for speech data so that the robot identifies voices and responds to both speech and image signaling.…”
Section: Discussionmentioning
confidence: 99%
“…The robot presented in this work accounts for obstacles only. An interesting future work is to make the robot more intelligent by using deep learning approaches such as those in [18], [19], [20], [21]. In this way, the robot can be trained for speech data so that the robot identifies voices and responds to both speech and image signaling.…”
Section: Discussionmentioning
confidence: 99%
“…x is the mean of the first sample of size n, 2 x is the mean of the second example of a similar size, and SD pooled is the pooled standard deviation of the two samples given as", 22 1 2 pooled SD SD SD 2   (15) where "SD 1 is the standard deviation of the first sample of size n and SD 2 is the standard deviation of the second sample of equal size" [35]. Table 2.…”
Section: Speaker Identification Algorithm Based On Cascaded Gmm-dnn Amentioning
confidence: 99%
“…This experiment has been conducted to show the relevance of the proposed GMM-DNN as a classifier to enhance speaker identification performance in emotional environments and to compare it with other classifiers in the literature[19],[21],[22]. Matejka et.al[19] studied utilizing Deep Neural Network Bottleneck (DNN-BN) features together with MFCCs in the task of i-vector-based speaker recognition.Richardson et.al[21] presented the application of single DNN for both speaker recognition and language recognition using the "2013 Domain Adaptation Challenge speaker recognition (DAC13)" and the "NIST 2011 Language Recognition Evaluation (LRE11)" benchmarks.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…Deep learning (a sub branch of machine learning) algorithms have been popular for automatic recognition of digits and characters of different languages. Deep networks can be trained in supervised fashion requiring labels, or in an unsupervised way without requirements of labels [3], [4], [5]. In this work, we use an autoencoder network and a convolutional neural network (CNN) trained with 85% portion of the dataset and tested with the remaining 15% of the data.…”
Section: Introductionmentioning
confidence: 99%