ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9415103
|View full text |Cite
|
Sign up to set email alerts
|

Differentiable Signal Processing With Black-Box Audio Effects

Abstract: We present a data-driven approach to automate audio signal processing by incorporating stateful third-party, audio effects as layers within a deep neural network. We then train a deep encoder to analyze input audio and control effect parameters to perform the desired signal manipulation, requiring only input-target paired audio data as supervision. To train our network with non-differentiable black-box effects layers, we use a fast, parallel stochastic gradient approximation scheme within a standard auto diffe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(23 citation statements)
references
References 21 publications
0
14
0
Order By: Relevance
“…Nevertheless, their performance drops significantly when processing real-world distorted guitar sounds, as we limited our experiments to two simplified methods for distortion, which represent only part of a commercial distortion effect. The positive outcomes of our experiments suggest that the neural models in this work can potentially be used to remove more complex distortion effects, if the training data includes such complex distortion algorithms (or their high-quality emulation, e.g., by using the frameworks [47] or [48]). We plan to validate this hypothesis in future work.…”
Section: Discussionmentioning
confidence: 99%
“…Nevertheless, their performance drops significantly when processing real-world distorted guitar sounds, as we limited our experiments to two simplified methods for distortion, which represent only part of a commercial distortion effect. The positive outcomes of our experiments suggest that the neural models in this work can potentially be used to remove more complex distortion effects, if the training data includes such complex distortion algorithms (or their high-quality emulation, e.g., by using the frameworks [47] or [48]). We plan to validate this hypothesis in future work.…”
Section: Discussionmentioning
confidence: 99%
“…As a third alternative, nondifferentiable DSP implementations can be used directly with numerical gradient approximation methods, which has been demonstrated in audio effect modeling, removal of breaths, and music mastering [17]. In context of controlling audio effects h(x, p) with input audio x and parameters p ∈ R P , we only need to estimate the partial derivatives for each parameter ∂ ∂ p i h(x, p).…”
Section: Alternative Differentiation Methodsmentioning
confidence: 99%
“…However, this approach requires manual implementation and often modification of DSP, imparting high engineering cost, potentially limiting its application. To work around this, two alternative methods have been proposed, including neural proxies (NP) of audio effects [16] and numerical gradient approximation schemes [17]. However, given that these approaches were proposed and evaluated in different tasks, it is difficult to fully understand their relative performance.…”
Section: Introductionmentioning
confidence: 99%
“…The system in [22] facilities training of audio plugin parameters or a chain of plugins for any desired transformation, given the appropriate training data. In this paper, the ability to modify the timbre of an undampened snare recording in order to elicit a perceptual change that corresponds to that of a dampened snare, referred to as Undampenedto-Dampened (U2D), will be explored through the use of multiple audio effects by utilizing the tools presented in [22]. The inverse transformation is also examined, whereby a dampened snare recording is modified to perceptually emulate qualities of an undampened snare recording, referred to as Dampened-to-Undampened (D2U).…”
Section: Motivationmentioning
confidence: 99%
“…In recent years, deep learning has demonstrated excellent performance in tasks such as emulating audio effects through end-to-end transformation methods [16][17][18], estimating audio effect parameters [19], mapping semantic descriptors to the parameter space of audio effects [20], and generating audio through differentiable digital signal processing [21]. More recently, Martinez et al [22] emulated three common audio production tasks (i.e., mastering, breath/plosive removal, and tube amplification) through the use of a deep encoder, which performs parameterization of third-party audio effects within layers of the network.…”
Section: Introductionmentioning
confidence: 99%