2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP) 2020
DOI: 10.1109/mlsp49062.2020.9231900
|View full text |Cite
|
Sign up to set email alerts
|

Sudo RM -RF: Efficient Networks for Universal Audio Source Separation

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
57
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 77 publications
(61 citation statements)
references
References 16 publications
0
57
0
Order By: Relevance
“…The proposed model is based on the learned-domain masking approach [14,15,[17][18][19][20][21][22] and employs an encoder, a decoder, and a masking network, as shown in Figure 1. The encoder is fully convolutional, while the masking network employs two Transformers embedded inside the dual-path processing block proposed in [17].…”
Section: The Modelmentioning
confidence: 99%
“…The proposed model is based on the learned-domain masking approach [14,15,[17][18][19][20][21][22] and employs an encoder, a decoder, and a masking network, as shown in Figure 1. The encoder is fully convolutional, while the masking network employs two Transformers embedded inside the dual-path processing block proposed in [17].…”
Section: The Modelmentioning
confidence: 99%
“…Although the selected Sudo rm -rf configuration has a large enough receptive field to cover the entire sequential feature, it obtains an even worse separation performance with on-par model size and complexity as the TCN architecture. Although [15] reported that the Sudo rm -rf architecture achieved constantly better performance than DPRNN and TCN architectures, the results here indicates that its performance on the more challenging noisy reverberant environments needs to be revised. Moreover, although all four architectures achieve significant SI-SDR improvement with respect to the unprocessed mixture, the improvement on wideband PESQ and STOI scores are moderate.…”
Section: Effect Of Gc3 In Different Separation Modulesmentioning
confidence: 64%
“…Since the context codec squeezes the long sequence by a factor of C/2 (16 for C = 32), the effective temporal receptive field of the TCN separator is significantly larger (0.253 × 16 = 4.05s) and thus can better capture the temporal dependencies. Since it has also been reported in [15] that a deeper Sudo rm -rf architecture can lead to better overall separation performance, introducing GC3 to Sudo rm -rf might also be equivalent to increasing the model depth and improves the performance. More in-depth analysis on the reason behind the performance improvements in different architectures is left for future work.…”
Section: Effect Of Gc3 In Different Separation Modulesmentioning
confidence: 99%
See 2 more Smart Citations