ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746005
|View full text |Cite
|
Sign up to set email alerts
|

The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(18 citation statements)
references
References 34 publications
0
18
0
Order By: Relevance
“…The overall operation of this module is represented by the transformation R : R D×B×T → R D×B×T to obtain the output Λ = R (V) ∈ R D×B×T . TF modeling with 8 residual GRU pairs accounts for 10.5 M trainable parameters 3 .…”
Section: Time Frequency Modelingmentioning
confidence: 99%
See 3 more Smart Citations
“…The overall operation of this module is represented by the transformation R : R D×B×T → R D×B×T to obtain the output Λ = R (V) ∈ R D×B×T . TF modeling with 8 residual GRU pairs accounts for 10.5 M trainable parameters 3 .…”
Section: Time Frequency Modelingmentioning
confidence: 99%
“…This indicates that the gradient ∂ ûi will be scaled down if the error on vi is high and vice versa, diluting the 3 Due to the computational complexity of backpropagation through time with long sequences, we experimented with replacing the RNNs with transformer encoders or convolutional layers. With similar numbers of parameters and all else being equal, these were not able to match the performance of an RNN-based module.…”
Section: F Loss Functionmentioning
confidence: 99%
See 2 more Smart Citations
“…As a result, singing has a piecewise constant pitch with rapid pitch shifts and other sorts of variations. Until recently, various research strategies and algorithms have been introduced to improve the separation results in SVS tasks [ 22 , 23 ]. The deep learning techniques [ 24 , 25 , 26 , 27 ] are perhaps the most widely used for SVS.…”
Section: Introductionmentioning
confidence: 99%