Differentiable Signal Processing With Black-Box Audio Effects

Ramírez, M. A.; Wang, Oliver; Smaragdis, Paris; Bryan, Nicholas J.

doi:10.1109/icassp39728.2021.9415103

Cited by 16 publications

(23 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nevertheless, their performance drops significantly when processing real-world distorted guitar sounds, as we limited our experiments to two simplified methods for distortion, which represent only part of a commercial distortion effect. The positive outcomes of our experiments suggest that the neural models in this work can potentially be used to remove more complex distortion effects, if the training data includes such complex distortion algorithms (or their high-quality emulation, e.g., by using the frameworks [47] or [48]). We plan to validate this hypothesis in future work.…”

Section: Discussionmentioning

confidence: 99%

Removing Distortion Effects in Music Using Deep Neural Networks

Imort¹,

Fabbro²,

Ramírez³

et al. 2022

Preprint

View full text Add to dashboard Cite

Audio effects are an essential element in the context of music production, and therefore, modeling analog audio effects has been extensively researched for decades using systemidentification methods, circuit simulation, and recently, deep learning. However, only few works tackled the reconstruction of signals that were processed using an audio effect unit. Given the recent advances in music source separation and automatic mixing, the removal of audio effects could facilitate an automatic remixing system. This paper focuses on removing distortion and clipping applied to guitar tracks for music production while presenting a comparative investigation of different deep neural network (DNN) architectures on this task. We achieve exceptionally good results in distortion removal using DNNs for effects that superimpose the clean signal to the distorted signal, while the task is more challenging if the clean signal is not superimposed. Nevertheless, in the latter case, the neural models under evaluation surpass one state-of-the-art declipping system in terms of source-to-distortion ratio, leading to better quality and faster inference.

show abstract

Section: Discussionmentioning

confidence: 99%

Removing Distortion Effects in Music Using Deep Neural Networks

Imort¹,

Fabbro²,

Ramírez³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…As a third alternative, nondifferentiable DSP implementations can be used directly with numerical gradient approximation methods, which has been demonstrated in audio effect modeling, removal of breaths, and music mastering [17]. In context of controlling audio effects h(x, p) with input audio x and parameters p ∈ R P , we only need to estimate the partial derivatives for each parameter ∂ ∂ p i h(x, p).…”

Section: Alternative Differentiation Methodsmentioning

confidence: 99%

“…However, this approach requires manual implementation and often modification of DSP, imparting high engineering cost, potentially limiting its application. To work around this, two alternative methods have been proposed, including neural proxies (NP) of audio effects [16] and numerical gradient approximation schemes [17]. However, given that these approaches were proposed and evaluated in different tasks, it is difficult to fully understand their relative performance.…”

Section: Introductionmentioning

confidence: 99%

Style Transfer of Audio Effects with Differentiable Signal Processing

Steinmetz¹,

Bryan²,

Reiss³

2022

J. Audio Eng. Soc.

View full text Add to dashboard Cite

This work presents a framework to impose the audio effects and production style from one recording to another by example with the goal of simplifying the audio production process. A deep neural network was trained to analyze an input recording and a style reference recording and predict the control parameters of audio effects used to render the output. In contrast to past work, this approach integrates audio effects as differentiable operators, enabling backpropagation through audio effects and end-to-end optimization with an audio-domain loss. Pairing this framework with a self-supervised training strategy enables automatic control of audio effects without the use of any labeled or paired training data. A survey of existing and new approaches for differentiable signal processing is presented, demonstrating how each can be integrated into the proposed framework along with a discussion of their trade-offs. The proposed approach is evaluated on both speech and music tasks, demonstrating generalization both to unseen recordings and even sample rates different than those during training. Convincing production style transfer results are demonstrated with the ability to transform input recordings to produced recordings, yielding audio effect control parameters that enable interpretability and user interaction.

show abstract

“…The system in [22] facilities training of audio plugin parameters or a chain of plugins for any desired transformation, given the appropriate training data. In this paper, the ability to modify the timbre of an undampened snare recording in order to elicit a perceptual change that corresponds to that of a dampened snare, referred to as Undampenedto-Dampened (U2D), will be explored through the use of multiple audio effects by utilizing the tools presented in [22]. The inverse transformation is also examined, whereby a dampened snare recording is modified to perceptually emulate qualities of an undampened snare recording, referred to as Dampened-to-Undampened (D2U).…”

Section: Motivationmentioning

confidence: 99%

“…In recent years, deep learning has demonstrated excellent performance in tasks such as emulating audio effects through end-to-end transformation methods [16][17][18], estimating audio effect parameters [19], mapping semantic descriptors to the parameter space of audio effects [20], and generating audio through differentiable digital signal processing [21]. More recently, Martinez et al [22] emulated three common audio production tasks (i.e., mastering, breath/plosive removal, and tube amplification) through the use of a deep encoder, which performs parameterization of third-party audio effects within layers of the network.…”

Section: Introductionmentioning

confidence: 99%

Deep Audio Effects for Snare Drum Recording Transformations

Cheshire¹,

Drysdale²,

Enderby³

et al. 2022

J. Audio Eng. Soc.

View full text Add to dashboard Cite

The ability to perceptually modify drum recording parameters in a post-recording process would be of great benefit to engineers limited by time or equipment. In this work, a datadriven approach to post-recording modification of the dampening and microphone positioning parameters commonly associated with snare drum capture is proposed. The system consists of a deep encoder that analyzes audio input and predicts optimal parameters of one or more third-party audio effects, which are then used to process the audio and produce the desired transformed output audio. Furthermore, two novel audio effects are specifically developed to take advantage of the multiple parameter learning abilities of the system. Perceptual quality of transformations is assessed through a subjective listening test, and an object evaluation is used to measure system performance. Results demonstrate a capacity to emulate snare dampening; however, attempts were not successful for emulating microphone position changes.

show abstract

Differentiable Signal Processing With Black-Box Audio Effects

Cited by 16 publications

References 21 publications

Removing Distortion Effects in Music Using Deep Neural Networks

Removing Distortion Effects in Music Using Deep Neural Networks

Style Transfer of Audio Effects with Differentiable Signal Processing

Deep Audio Effects for Snare Drum Recording Transformations

Contact Info

Product

Resources

About