Reverb Conversion Of Mixed Vocal Tracks Using An End-To-End Convolutional Deep Neural Network

Junghyun, Koo,; Paik, Seungryeol; Lee, Kyogu

doi:10.1109/icassp39728.2021.9414038

Cited by 8 publications

(6 citation statements)

References 19 publications

(19 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Comparison methods. We evaluate the proposed method against three baselines: Reverb Conversion (RC) (Koo et al, 2021), Music Enhancement (ME) (Kandpal et al, 2022), and Unsupervised Dereverberation (UD) (Saito et al, 2022). RC is a state-of-the-art, end-to- (Koo et al, 2021) 5.69 0.02 7.23 Music Enhancement (Kandpal et al, 2022) 7.51 −23.9 7.92 Unsupervised Dereverberation (Saito et al, 2022) 4 Evaluation metrics.…”

Section: Vocal Dereverberationmentioning

confidence: 99%

“…We evaluate the proposed method against three baselines: Reverb Conversion (RC) (Koo et al, 2021), Music Enhancement (ME) (Kandpal et al, 2022), and Unsupervised Dereverberation (UD) (Saito et al, 2022). RC is a state-of-the-art, end-to- (Koo et al, 2021) 5.69 0.02 7.23 Music Enhancement (Kandpal et al, 2022) 7.51 −23.9 7.92 Unsupervised Dereverberation (Saito et al, 2022) 4 Evaluation metrics. For quantitative comparison of the different methods, the metrics are the scale-invariant signalto-distortion ratio (SI-SDR) (Roux et al, 2019) improvement, the Fréchet Audio Distance (FAD) (Kilgour et al, 2018), and the speech-to-reverberation modulation energy ratio (SRMR) (Santos et al, 2014).…”

Section: Vocal Dereverberationmentioning

confidence: 99%

“…We use an exponential moving average over model parameters with a rate of 0.9999 (Song & Ermon, 2020). , 1112 ) from those of our test dataset (Koo et al, 2021). We input pairs of wet and dry signals since this method needs them for dereverberation.…”

Section: C2 Vocal Dereverberationmentioning

confidence: 99%

See 2 more Smart Citations

GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration

Murata¹,

Saito²,

Lai³

et al. 2023

Preprint

View full text Add to dashboard Cite

Pre-trained diffusion models have been successfully used as priors in a variety of linear inverse problems, where the goal is to reconstruct a signal from noisy linear measurements. However, existing approaches require knowledge of the linear operator. In this paper, we propose GibbsDDRM, an extension of Denoising Diffusion Restoration Models (DDRM) to a blind setting in which the linear measurement operator is unknown. Gibbs-DDRM constructs a joint distribution of the data, measurements, and linear operator by using a pretrained diffusion model for the data prior, and it solves the problem by posterior sampling with an efficient variant of a Gibbs sampler. The proposed method is problem-agnostic, meaning that a pretrained diffusion model can be applied to various inverse problems without fine tuning. In experiments, it achieved high performance on both blind image deblurring and vocal dereverberation tasks, despite the use of simple generic priors for the underlying linear operators.

show abstract

Section: Vocal Dereverberationmentioning

confidence: 99%

Section: Vocal Dereverberationmentioning

confidence: 99%

See 1 more Smart Citation

GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration

Murata¹,

Saito²,

Lai³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…An end-to-end approach of converting black-box music effects with already processed audio tracks comes in need upon the loss of original dry source tracks or unavailability of replicating the setup of the mastering chain at the time, which may occur especially with old recordings. [8] introduced a system that interchanges the musical reverberant effects of two differently processed vocal tracks yet required massive storage of already-processed data to train. We propose an end-to-end remastering system that thoroughly converts the originally mastered effects to the desired style.…”

Section: Related Workmentioning

confidence: 99%

End-to-end Music Remastering System Using Self-supervised and Adversarial Training

Junghyun¹,

Paik²,

Lee³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Mastering is an essential step in music production, but it is also a challenging task that has to go through the hands of experienced audio engineers, where they adjust tone, space, and volume of a song. Remastering follows the same technical process, in which the context lies in mastering a song for the times. As these tasks have high entry barriers, we aim to lower the barriers by proposing an endto-end music remastering system that transforms the mastering style of input audio to that of the target. The system is trained in a selfsupervised manner, in which released pop songs were used for training. We also anticipated the model to generate realistic audio reflecting the reference's mastering style by applying a pre-trained encoder and a projection discriminator. We validate our results with quantitative metrics and a subjective listening test and show that the model generated samples of mastering style similar to the target.

show abstract

“…Blind estimation of the room impulse response from reverberant speech has also been explored [57,65]. In music production, acoustic matching is applied to change the reverberation to emulate that from a target space or processing algorithm [35,51]. Recent work conditions the target-audio generation on a low-dimensional audio embedding [59].…”

Section: Related Workmentioning

confidence: 99%

Visual Acoustic Matching

Chen¹,

Gao²,

Calamia³

et al. 2022

Preprint

View full text Add to dashboard Cite

We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment. Given an image of the target environment and a waveform for the source audio, the goal is to re-synthesize the audio to match the target room acoustics as suggested by its visible geometry and materials. To address this novel task, we propose a cross-modal transformer model that uses audio-visual attention to inject visual properties into the audio and generate realistic audio output. In addition, we devise a self-supervised training objective that can learn acoustic matching from in-the-wild Web videos, despite their lack of acoustically mismatched audio. We demonstrate that our approach successfully translates human speech to a variety of real-world environments depicted in images, outperforming both traditional acoustic matching and more heavily supervised baselines.

show abstract

Reverb Conversion Of Mixed Vocal Tracks Using An End-To-End Convolutional Deep Neural Network

Cited by 8 publications

References 19 publications

GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration

GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration

End-to-end Music Remastering System Using Self-supervised and Adversarial Training

Visual Acoustic Matching

Contact Info

Product

Resources

About