Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2013
DOI: 10.1007/978-3-642-29752-6_15
|View full text |Cite
|
Sign up to set email alerts
|

Auditory Processing Inspired Robust Feature Enhancement for Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…In this work, we adopt gammatonegrams, which are a visual representation of the energy of a signal based on short-time Fourier transform (STFT) and the application of Gammatone filterbanks [13], which have been firstly introduced in [11]. It has been proved, indeed, that such filtering, is able to guarantee robustness to noise for speech analysis tasks [15,24]. The gammatonegrams are generated following the specification illustrated in [29] and [17], employing a bank of 64 filters.…”
Section: Feature Representationmentioning
confidence: 99%
“…In this work, we adopt gammatonegrams, which are a visual representation of the energy of a signal based on short-time Fourier transform (STFT) and the application of Gammatone filterbanks [13], which have been firstly introduced in [11]. It has been proved, indeed, that such filtering, is able to guarantee robustness to noise for speech analysis tasks [15,24]. The gammatonegrams are generated following the specification illustrated in [29] and [17], employing a bank of 64 filters.…”
Section: Feature Representationmentioning
confidence: 99%
“…Gammatonegrams decompose a signal by passing it through a bank of gammatone filters equally spaced on the ERB scale and were designed to model the human auditory system. We opted for gammatonegrams as they have proven to be robust to noisy environments [28] and thus suited to deal with robot data. We pre-processed the Dataset voices by taking chunks of audio of 1 second with a hop-length of 250ms and we discarded chunks having less than 80% of voice data using the Google voice activity detector.…”
Section: Multi Classificationmentioning
confidence: 99%