ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413901
|View full text |Cite
|
Sign up to set email alerts
|

Attention Is All You Need In Speech Separation

Abstract: Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-to-sequence learning. RNNs, however, are inherently sequential models that do not allow parallelization of their computations. Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.In this paper, we propose the SepFormer, a novel RNN-free Transformer-based neural network for speech separation. The Sep-Former learns short and long-term depend… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
169
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 309 publications
(216 citation statements)
references
References 26 publications
1
169
0
Order By: Relevance
“…From the results of experiments, we conclude that DF-Conformer is an effective model for SE. Future works include jointtraining of SE and ASR using an all Conformer model, and comparison with the dual-path methods [16][17][18] on the SE task.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…From the results of experiments, we conclude that DF-Conformer is an effective model for SE. Future works include jointtraining of SE and ASR using an all Conformer model, and comparison with the dual-path methods [16][17][18] on the SE task.…”
Section: Discussionmentioning
confidence: 99%
“…One possible approach is to use the dual-path approach [16][17][18], which is equivalent to using sparse and block diagonal attention matrices corresponding to the inter-and intra-transformers, respectively. Alternatively, we use FAVOR+ attention introduced in Performer [26] which has linear computational complexity : O(N ).…”
Section: Model Structure and Computational Challengesmentioning
confidence: 99%
See 1 more Smart Citation
“…Recent advances in deep learning have enabled the development of neural network architectures capable of separating individual sound sources from mixtures of sounds with high fidelity. Discriminative separation models with supervised training have obtained stateof-the-art performance on multiple tasks such as music separation [1], speech separation [2,3] and speech enhancement [4,5]. However, gathering clean source waveforms to perform supervised training under various domains can be cumbersome or even impossible.…”
Section: Introductionmentioning
confidence: 99%
“…The main focus of Tas-Net is the separator that estimates the masks. A lot of work has since been done to improve the separator, such as fullyconvolutional TasNet (Conv-TasNet) [12], dual-path recurrent neural network (DPRNN) [19], gated DualPathRNN [14], dualpath transformer network (DPT-Net) [15], and SepFormer [20]. Among them, the dual-path method is the mainstream, which processes the waveform from the two dimensions of the local path and the global path.…”
Section: Introductionmentioning
confidence: 99%