Filterbank learning using Convolutional Restricted Boltzmann Machine for speech recognition

Sailor, Hardik B.; Patil, Hemant A.

doi:10.1109/icassp.2016.7472808

Cited by 27 publications

(33 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hence, the models learnt on NLSC might not be optimal for individual non-native speaker group. However, the model trained on native English speakers" database represents an optimal auditory code [17], [27] that captured the common traits of non-native speakers. From Table 1, we also observe that the accuracy of handcrafted MFCC+SDC features is highest, i.e., it performs better than our proposed data-driven features (WSJ and AURORA) specifically with SDC.…”

Section: Results On the Development Setmentioning

confidence: 99%

Native Language Identification Using Spectral and Source-Based Features

et al. 2016

Self Cite

View full text Add to dashboard Cite

The task of native language (L1) identification from nonnative language (L2) can be thought of as the task of identifying the common traits that each group of L1 speakers maintains while speaking L2 irrespective of the dialect or region. Under the assumption that speakers are L1 proficient, non-native cues in terms of segmental and prosodic aspects are investigated in our work. In this paper, we propose the use of longer duration cepstral features, namely, Mel frequency cepstral coefficients (MFCC) and auditory filterbank features learnt from the database using Convolutional Restricted Boltzmann Machine (ConvRBM) along with their delta and shifted delta features. MFCC and ConvRBM gave accuracy of 38.2% and 36.8%, respectively, on the development set provided for the ComParE 2016 Nativeness Task using Gaussian Mixture Model (GMM) classifier. To add complementary information about the prosodic and excitation source features, phrase information and its dynamics extracted from the log(F 0) contour of the speech was explored. The accuracy obtained using score-level fusion between system features (MFCC and ConvRBM) and phrase features were 39.6% and 38.3%, respectively, indicating that phrase information and MFCC capture complementary information than ConvRBM alone. Furthermore, score-level fusion of MFCC, ConvRBM and phrase improves the accuracy to 40.2%.

show abstract

Section: Results On the Development Setmentioning

confidence: 99%

Native Language Identification Using Spectral and Source-Based Features

et al. 2016

Self Cite

View full text Add to dashboard Cite

show abstract

“…The review of different methods for unsupervised filterbank learning is given in [22]. ConvRBM filterbank was shown to perform better than MFCC and Mel filterbank features for speech recognition task [22], [23].…”

Section: Introductionmentioning

confidence: 99%

“…The filterbank learned using ConvRBM was used to extract the features from the genuine and spoofed speech signals. Compared to our earlier works [22,23], here we have used an Adam optimization [24] in ConvRBM training. The experiments on ASV 2015 database shows that ConvRBM-based features perform better than MFCC features.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection

2017

Self Cite

View full text Add to dashboard Cite

Speech Synthesis (SS) and Voice Conversion (VC) presents a genuine risk of attacks for Automatic Speaker Verification (ASV) technology. In this paper, we use our recently proposed unsupervised filterbank learning technique using Convolutional Restricted Boltzmann Machine (ConvRBM) as a frontend feature representation. ConvRBM is trained on training subset of ASV spoof 2015 challenge database. Analyzing the filterbank trained on this dataset shows that ConvRBM learned more low-frequency subband filters compared to training on natural speech database such as TIMIT. The spoofing detection experiments were performed using Gaussian Mixture Models (GMM) as a back-end classifier. ConvRBM-based cepstral coefficients (ConvRBM-CC) perform better than hand crafted Mel Frequency Cepstral Coefficients (MFCC). On the evaluation set, ConvRBM-CC features give an absolute reduction of 4.76 % in Equal Error Rate (EER) compared to MFCC features. Specifically, ConvRBM-CC features significantly perform better in both known attacks (1.93 %) and unknown attacks (5.87 %) compared to MFCC features.

show abstract

“…In equation (13), K ∈ R h ×w ×c ×c represents convolution kernel, b k is a bias after a convolution operation, and f (·) denotes a nonlinear activation function called the rectified linear unit (ReLU) [11] which is shown as follows…”

Section: Convolutional Neural Networkmentioning

confidence: 99%

SAR deception jamming target recognition based on the shadow feature

Tang

2017

2017 25th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

Abstract-SAR deception jamming method is one of the most important jamming techniques by overlapping a group of fake targets into the SAR images, which can greatly reduce the accuracy of the SAR image interpretation. On the other hands, as a kind of active remote sensing technique, SAR system has less diffuse scattering, and the shadow characteristic is more significant than the optic system. In this paper, the shadow characteristics of the true and false targets are discussed via the simulation experiment, and the convolutional neural network(CNN) is applied for SAR deception jamming target recognition based on the shadow feature. Numerical experiments have shown that the CNN method can effectively distinguish the true and false targets correctly through the shadow feature.

show abstract

Filterbank learning using Convolutional Restricted Boltzmann Machine for speech recognition

Cited by 27 publications

References 19 publications

Native Language Identification Using Spectral and Source-Based Features

Native Language Identification Using Spectral and Source-Based Features

Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection

SAR deception jamming target recognition based on the shadow feature

Contact Info

Product

Resources

About