2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2013
DOI: 10.1109/waspaa.2013.6701894
|View full text |Cite
|
Sign up to set email alerts
|

The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech

Abstract: Recently, substantial progress has been made in the field of reverberant speech signal processing, including both single-and multichannel de-reverberation techniques, and automatic speech recognition (ASR) techniques robust to reverberation. To evaluate state-ofthe-art algorithms and obtain new insights regarding potential future research directions, we propose a common evaluation framework including datasets, tasks, and evaluation metrics for both speech enhancement and ASR techniques. The proposed framework … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
254
0
3

Year Published

2014
2014
2021
2021

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 321 publications
(258 citation statements)
references
References 11 publications
1
254
0
3
Order By: Relevance
“…The adaptation of DNN acoustic models to specific acoustic conditions has been investigated, e.g., (Seltzer et al, 2013;Karanasou et al, 2014), however it has been evaluated in multi-condition settings rather than actual mismatched conditions. The impact of the number of microphones on the WER obtained after enhancing reverberated speech was evaluated in the REVERB challenge (Kinoshita et al, 2013), but the impact of microphone distance was not considered and no such large-scale experiment was performed with noisy speech. To our knowledge, a study of the impact of mismatched noise environments on the resulting ASR performance is also missing.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The adaptation of DNN acoustic models to specific acoustic conditions has been investigated, e.g., (Seltzer et al, 2013;Karanasou et al, 2014), however it has been evaluated in multi-condition settings rather than actual mismatched conditions. The impact of the number of microphones on the WER obtained after enhancing reverberated speech was evaluated in the REVERB challenge (Kinoshita et al, 2013), but the impact of microphone distance was not considered and no such large-scale experiment was performed with noisy speech. To our knowledge, a study of the impact of mismatched noise environments on the resulting ASR performance is also missing.…”
Section: Introductionmentioning
confidence: 99%
“…Few existing datasets involve both real and simulated data. In the REVERB dataset (Kinoshita et al, 2013), the speaker distances for real and simulated data differ, which does not allow fair comparison. The CHiME-3 dataset (Barker et al, 2015) provides a data simulation tool which aims to reproduce the characteristics of real data for training and twinned real and simulated data pairs for development and testing.…”
Section: Introductionmentioning
confidence: 99%
“…We took observed signals from the REVERB challenge database [19], specifically the real-world eight-channel recording AMI WSJ20-Array1-* T10c020c.wav in RealData. The recording contains reverberation of RT60 ∼ 0.7 s and some noise, and was truncated at 2 s. The sampling frequency was 16 kHz.…”
Section: Experimental Verificationmentioning
confidence: 99%
“…The frame length and hop for STFT were 1024 and 256 points (equivalent to 64 and 16 ms), respectively; the window type Hamming; the prediction order K = 3; the prediction delay Δ = 3; the number of iterations 20. We evaluate the following measures as defined in [19]: the cepstrum distance (CD), the log-likelihood ratio (LLR), the frequency-weighted segmental signal-to-noise ratio (FWSegSNR), and the speech-to-reverberation modulation energy ratio (SRMR). These measures were evaluated by using the headset recording AMI WSJ20-Headset1 T10c020c.wav as reference.…”
Section: Experimental Verificationmentioning
confidence: 99%
“…In such scenarios, close-talking input is often impractical or unsafe, and, while very challenging, allowing the speaker to be far from the microphone is highly desirable. To validate the effectiveness of state-of-the-art speech enhancement and ASR techniques in distant-talking conditions, several challenges have been organized [4], [22], [44]. Among these, the Computational Hearing in Multisource Environments (CHiME) challenges recently introduced noise-robust speech processing tasks with a small number of microphones [4], [44].…”
Section: Introductionmentioning
confidence: 99%