“…Lately, there has been increasing interest in nonlinear models, specifically, Deep Neural Networks (DNNs) [21,22,23,24]. In Deep Clustering (DPCL) [25,26], first, the timefrequency bins of the mixtures are mapped into an embedding space; then, a clustering algorithm is performed in the embedding space; finally, a binary mask is generated based on each cluster to reconstruct speech of each speaker.…”