2014
DOI: 10.1186/1687-4722-2014-13
|View full text |Cite
|
Sign up to set email alerts
|

Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition

Abstract: We present a feature enhancement method that uses neural networks (NNs) to map the reverberant feature in a log-melspectral domain to its corresponding anechoic feature. The mapping is done by cascade NNs trained using Cascade2 algorithm with an implementation of segment-based normalization. Experiments using speaker identification (SID) and automatic speech recognition (ASR) systems were conducted to evaluate the method. The experiments of SID system was conducted by using our own simulated and real reverbera… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 38 publications
(52 reference statements)
0
6
0
Order By: Relevance
“…In choosing the context frames, we use every second frame relative to the center frame in order to reduce the redundancies caused by the windowing of STFT. Although this causes some information loss, this enables the supervectors to represent a longer context [16], [48]. In addition, we do not use the magnitude spectra of the context frames directly, but the difference of magnitude between the context frames and the center frame.…”
Section: Dnn Spectral Modelsmentioning
confidence: 99%
“…In choosing the context frames, we use every second frame relative to the center frame in order to reduce the redundancies caused by the windowing of STFT. Although this causes some information loss, this enables the supervectors to represent a longer context [16], [48]. In addition, we do not use the magnitude spectra of the context frames directly, but the difference of magnitude between the context frames and the center frame.…”
Section: Dnn Spectral Modelsmentioning
confidence: 99%
“…In choosing the context frames, we use every second frame relative to the center frame in order to reduce the redundancies caused by the windowing of STFT. Although this causes some information loss, this enables the supervectors to represent a longer context [28,29]. In addition, we do not use the feature values of context frames directly, but the difference between the values of the context frames and the center frame.…”
Section: Dnn Input and Outputmentioning
confidence: 99%
“…Recent work has shown that deep neural networks can be very effective for channel compensation in speech recognition algorithms [7,8,9]. The type of channel compensation DNNs are used for falls into three basic categories: waveform compensation [10,8], feature compensation [11,12,13,8,9] and multicondition classification [14,15,8,9]. The first two categories are very similar in that they use a DNN regression to reconstruct some possibly intermediate feature representation from a clean channel using some possibly different feature representation of the same data from a noisy channel.…”
Section: Introductionmentioning
confidence: 99%