2017
DOI: 10.1121/1.4983751
|View full text |Cite
|
Sign up to set email alerts
|

Auditory feature representation using convolutional restricted Boltzmann machine and Teager energy operator for speech recognition

Abstract: In this letter, authors propose an auditory feature representation technique with the filterbank learned using an annealing dropout convolutional restricted Boltzmann machine (ConvRBM) and noise-robust energy estimation using the Teager energy operator (TEO). TEO is applied on each subband of ConvRBM filterbank and pooled later to get the short-term spectral features. Experiments on AURORA 4 database show that the proposed features perform better than the Mel filterbank features. The relative improvement of 2.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 12 publications
0
8
0
Order By: Relevance
“…Our future works includes detailed analysis of natural and spoof speech regarding the nature of subband filters and frequency scale. We would also like use our Unsupervised Deep Auditory Model (UDAM) [30] along with TEO [31] for the SSD task.…”
Section: Discussionmentioning
confidence: 99%
“…Our future works includes detailed analysis of natural and spoof speech regarding the nature of subband filters and frequency scale. We would also like use our Unsupervised Deep Auditory Model (UDAM) [30] along with TEO [31] for the SSD task.…”
Section: Discussionmentioning
confidence: 99%
“…Compared to our earlier work in [13], [14], we have used noisy leaky rectifier linear units (NLReLU) proposed in [19] to avoid the limitations of ReLU. Annealing dropout is applied in the ConvRBM training with the annealing schedule chosen in [20]. The ConvRBM training is performed using contrastive divergence (CD) [21].…”
Section: Convrbm For Auditory Filterbank Learningmentioning
confidence: 99%
“…The moment parameters of Adam optimization was chosen to be β1=0.5, and β2=0.999. The annealing dropout probability was chosen to be 0.3 based on our earlier experiments in the ASR [20] and environmental sound classification [30]. After the model was trained, the features were extracted from the speech signal as discussed in Section 2.2.…”
Section: Training Of Convrbm and Feature Extractionmentioning
confidence: 99%
“…For analysis of the subband filters, we first sort it according to the center frequencies (CFs) of the subband filters as done Figure 1: Block diagram of the proposed ConvRBM with dropout mask. After [21], [22]. in [21].…”
Section: Analysis Of Filterbank 31 Analysis Of Subband Filtersmentioning
confidence: 99%
“…In this paper, we propose to exploit ConvRBM as a frontend for filterbank learning from the raw audio signals. Compared to our earlier works in [20], [21] and [22], here we have used Adam optimization [23] along with an annealed dropout technique [24]. Invariant representation is learned from the raw audio using ConvRBM and higher-level invariance is achieved using supervised CNN as a classifier.…”
Section: Introductionmentioning
confidence: 99%