Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-731
|View full text |Cite
|
Sign up to set email alerts
|

Realistic Multi-Microphone Data Simulation for Distant Speech Recognition

Abstract: The availability of realistic simulated corpora is of key importance for the future progress of distant speech recognition technology. The reliability, flexibility and low computational cost of a data simulation process may ultimately allow researchers to train, tune and test different techniques in a variety of acoustic scenarios, avoiding the laborious effort of directly recording real data from the targeted environment.In the last decade, several simulated corpora have been released to the research communit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
32
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
4

Relationship

3
6

Authors

Journals

citations
Cited by 32 publications
(32 citation statements)
references
References 30 publications
0
32
0
Order By: Relevance
“…To validate our model in a more challenging scenario, experiments were also conducted in distant-talking conditions with the DIRHA-English dataset 4 [36,37]. Training was based on the original WSJ-5k corpus (consisting of 7, 138 sentences uttered by 83 speakers) that was contaminated with a set of impulse responses measured in a domestic environment [37]. The test phase was carried out with the real part of the dataset, consisting of 409 WSJ sentences uttered in the aforementioned environment by six native American speakers.…”
Section: Corpora and Tasksmentioning
confidence: 99%
“…To validate our model in a more challenging scenario, experiments were also conducted in distant-talking conditions with the DIRHA-English dataset 4 [36,37]. Training was based on the original WSJ-5k corpus (consisting of 7, 138 sentences uttered by 83 speakers) that was contaminated with a set of impulse responses measured in a domestic environment [37]. The test phase was carried out with the real part of the dataset, consisting of 409 WSJ sentences uttered in the aforementioned environment by six native American speakers.…”
Section: Corpora and Tasksmentioning
confidence: 99%
“…The assumption is that much like blind or non-intrusive acoustic parameter estimation can be used as a proxy for estimating ASR performance [16], a neural network model can be trained to extract features from reverberant speech that are correlated with WER. The proposed method assumes reverberant speech samples transcribed by an ASR engine and the corresponding WER per utterance calculated by (7). The same data split as described in Section 2 is used.…”
Section: Predicting Wer Blindly From Reverberant Speech Using a Cnn-lmentioning
confidence: 99%
“…To evaluate the impact of room acoustics on the accuracy of speaker verification, a proper dataset of reverberant audio is needed. An alternative that fills a qualitative gap between unsatisfying simulation (despite the improvement of realism reported in Ravanelli et al, 2016) and costly and demanding real speaker recording, is retransmission. To our advantage, we can also use the fact that a known dataset can be retransmitted so that the performances are readily comparable with known benchmarks.…”
Section: Nist Retransmitted Set (But-ret)mentioning
confidence: 99%