2014
DOI: 10.1007/978-3-319-11581-8_48
|View full text |Cite
|
Sign up to set email alerts
|

Robust Multi-Band ASR Using Deep Neural Nets and Spectro-temporal Features

Abstract: Abstract. Spectro-temporal feature extraction and multi-band processing were both designed to make the speech recognizers more robust. Although they have been used for a long time now, very few attempts have been made to combine them. This is why here we integrate two spectrotemporal feature extraction methods into a multi-band framework. We assess the performance of our spectro-temporal feature sets both individually (as a baseline) and in combination with multi-band processing in phone recognition tasks on c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2017
2017
2018
2018

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 15 publications
(24 reference statements)
0
2
0
Order By: Relevance
“…Multi-band or sub-band representation of audio has been utilized in a variety of applications, such as audio coding [10], [11], audio enhancement [12], audio upmixing [13], and automatic speech recognition (ASR) [14]. For example, several audio coding standards, including MPEG Surround [10] and MPEG-H [11], have incorporated a quadrature mirror filter (QMF) to obtain uniformly distributed and oversampled frequency representations of audio signals.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Multi-band or sub-band representation of audio has been utilized in a variety of applications, such as audio coding [10], [11], audio enhancement [12], audio upmixing [13], and automatic speech recognition (ASR) [14]. For example, several audio coding standards, including MPEG Surround [10] and MPEG-H [11], have incorporated a quadrature mirror filter (QMF) to obtain uniformly distributed and oversampled frequency representations of audio signals.…”
Section: Introductionmentioning
confidence: 99%
“…In addition, multiband representation has been applied to audio enhancement and has improved the enhancement quality by variously performing noise attenuation according to the given sub-bands [12]. Moreover, a DNN was realized in a sub-band manner for ASR, which resulted in the reduction of the average word error rate compared to that in a full-band manner [14].…”
Section: Introductionmentioning
confidence: 99%