ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053697
|View full text |Cite
|
Sign up to set email alerts
|

Interrupted and Cascaded Permutation Invariant Training for Speech Separation

Abstract: Permutation Invariant Training (PIT) has long been a stepping stone method for training speech separation model in handling the label ambiguity problem. With PIT selecting the minimum cost label assignments dynamically, very few studies considered the separation problem to be optimizing both the model parameters and the label assignments, but focused on searching for good model architecture and parameters. In this paper, we investigate instead for a given model architecture the various flexible label assignmen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
10
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 23 publications
1
10
0
Order By: Relevance
“…Our SI-SNRi results (16.5 and 17.5 dB) are promising. Further, we also confirmed the reduction of permutation errors and generalization of improvements to other test sets, which was not tested in [7,8].…”
Section: Additional Discussion: Previous Results On Wsj0-2mixsupporting
confidence: 70%
See 2 more Smart Citations
“…Our SI-SNRi results (16.5 and 17.5 dB) are promising. Further, we also confirmed the reduction of permutation errors and generalization of improvements to other test sets, which was not tested in [7,8].…”
Section: Additional Discussion: Previous Results On Wsj0-2mixsupporting
confidence: 70%
“…Prob-PIT [7] considers the probabilities of all utterance level permutations, rather than just the best one, improving the initial training stage when wrong alignments are likely to happen. A similar idea is employed by Yang et al [8], who trained a Conv-TasNet with uPIT and fixed alignments in turns, reporting 17.5 dB SI-SNRi. They also implemented Prob-PIT for Conv-TasNet and obtained 15.9 dB.…”
Section: Additional Discussion: Previous Results On Wsj0-2mixmentioning
confidence: 99%
See 1 more Smart Citation
“…This is the technique used in Permutation Invariant Training (PIT) speech separation method which has been shown to be effective in addressing the permutation ambiguity [35]. However, it has been discussed [39], [37] that the hard decision on choosing the minimum cost as the best solution results in training a sub-optimal separation model. To be more specific, the process of choosing the correct separation error is more challenging in the initial epochs of training, where the network is still naive and its outputs are not reliable.…”
Section: Problem Formulationmentioning
confidence: 99%
“…Although the PIT forces the frames belonging to the same speaker to be aligned with the same output stream, frames inside one utterance can still flip between different sources, leading to a poor separation performance. Alternatively, the initial PIT-based separation model can be further trained with a fixed label training strategy [3], or a long term dependency can be imposed to the output streams by adding an additional speaker identity loss [4,5]. Another issue in blind source separation is that the speaker order of the separated signals during inference is also unknown, and needs to be identified by a speaker recognition system.…”
Section: Introductionmentioning
confidence: 99%