StoRIR: Stochastic Room Impulse Response Generation for Audio Data Augmentation

Masztalski, Piotr; Matuszewski, Mateusz; Piaskowski, Karol; Romaniuk, Michał

doi:10.21437/interspeech.2020-2261

Cited by 7 publications

(3 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We use the default configuration provided in the official example 4 . 4) StoRIR [17]: StoRIR uses a random energy-rescaled impulse train to estimate the RIR filter. Although it is not an ISM-based method, we select it as one of the comparable methods as it also generates the RIR filters in a stochastic way.…”

Section: Visualizationmentioning

confidence: 99%

FRA-RIR: Fast Random Approximation of the Image-source Method

Luo¹,

Yu²

2022

Preprint

View full text Add to dashboard Cite

The training of modern speech processing systems often requires a large amount of simulated room impulse response (RIR) data in order to allow the systems to generalize well in realworld, reverberant environments. However, simulating realistic RIR data typically requires accurate physical modeling, and the acceleration of such simulation process typically requires certain computational platforms such as a graphics processing unit (GPU). In this paper, we propose FRA-RIR, a fast random approximation method of the widely-used image-source method (ISM), to efficiently generate realistic RIR data without specific computational devices. FRA-RIR replaces the physical simulation in the standard ISM by a series of random approximations, which significantly speeds up the simulation process and enables its application in on-the-fly data generation pipelines. Experiments show that FRA-RIR can not only be significantly faster than other existing ISM-based RIR simulation tools on standard computational platforms, but also improves the performance of speech denoising systems evaluated on real-world RIR when trained with simulated RIR. A Python implementation of FRA-RIR is available online 1 .

show abstract

Section: Visualizationmentioning

confidence: 99%

FRA-RIR: Fast Random Approximation of the Image-source Method

Luo¹,

Yu²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Despite this, these datasets do not straightforwardly generalize to, e.g., multichannel settings with a specific microphone array geometry. To alleviate such stringent computational requirements, a fast stochastic RIR simulator was proposed in [13] and used to train speech enhancement models. While better generalization to real data was demonstrated w.r.t.…”

Section: Introductionmentioning

confidence: 99%

Realistic Sources, Receivers and Walls Improve The Generalisability of Virtually-Supervised Blind Acoustic Parameter Estimators

Srivastava

Deleforge

2022

2022 International Workshop on Acoustic Signal Enhancement (IWAENC)

View full text Add to dashboard Cite

Blind acoustic parameter estimation consists in inferring the acoustic properties of an environment from recordings of unknown sound sources. Recent works in this area have utilized deep neural networks trained either partially or exclusively on simulated data, due to the limited availability of real annotated measurements. In this paper, we study whether a model purely trained using a fast image-source room impulse response simulator can generalize to real data. We present an ablation study on carefully crafted simulated training sets that account for different levels of realism in source, receiver and wall responses. The extent of realism is controlled by the sampling of wall absorption coefficients and by applying measured directivity patterns to microphones and sources. A state-of-the-art model trained on these datasets is evaluated on the task of jointly estimating the room's volume, total surface area, and octave-band reverberation times from multiple, multichannel speech recordings. Results reveal that every added layer of simulation realism at train time significantly improves the estimation of all quantities on real signals.

show abstract

“…In recent years, an increasing number of RIR generators have been introduced to generate a realistic RIR for a given acoustic environment [5][6][7][8]. Accurate RIR generators can generate RIRs with various acoustic effects (e.g., diffraction, scattering, early reflections, late reverberations) [9].…”

Section: Introductionmentioning

confidence: 99%

FAST-RIR: Fast neural diffuse room impulse response generator

Ratnarajah¹,

Zhang²,

Meng³

et al. 2021

Preprint

View full text Add to dashboard Cite

We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment. Our FAST-RIR takes rectangular room dimensions, listener and speaker positions, and reverberation time (T 60 ) as inputs and generates specular and diffuse reflections for a given acoustic environment. Our FAST-RIR is capable of generating RIRs for a given input T 60 with an average error of 0.02s. We evaluate our generated RIRs in automatic speech recognition (ASR) applications using Google Speech API, Microsoft Speech API, and Kaldi tools. We show that our proposed FAST-RIR with batch size 1 is 400 times faster than a stateof-the-art diffuse acoustic simulator (DAS) on a CPU and gives similar performance to DAS in ASR experiments. Our FAST-RIR is 12 times faster than an existing GPU-based RIR generator (gpuRIR). We show that our FAST-RIR outperforms gpuRIR by 2.5% in an AMI far-field ASR benchmark.

show abstract

StoRIR: Stochastic Room Impulse Response Generation for Audio Data Augmentation

Cited by 7 publications

References 16 publications

FRA-RIR: Fast Random Approximation of the Image-source Method

FRA-RIR: Fast Random Approximation of the Image-source Method

Realistic Sources, Receivers and Walls Improve The Generalisability of Virtually-Supervised Blind Acoustic Parameter Estimators

FAST-RIR: Fast neural diffuse room impulse response generator

Contact Info

Product

Resources

About