“…This variance reduction works well with the expectation maximization (EM) algorithm [2] but not for the convolutional neural network U-Net [11]. The inclusion of IPD cues (whether the values observed from the mixture or those after the phase unwrap by the top down approach) in SONET [11], resulted in decline of its output performance. The performance comparison of the two speech separation models, one using the EM algorithm and the other using the SONET-P network for clustering the IPD cues, is given in the 'experiment' section (Section V).…”