“…It consists in using some auxiliary information about the sources and/or the mixing process to guide the separation. For example, score-informed approaches rely on musical score to guide the separation in music recordings [3][4][5][6], separation-by-humming (SbH) algorithms exploit a sound "hummed" by the user mimicking the source of interest [7,8], and user-guided approaches take into account knowledge about, e.g., user-selected F0 track [9] or userannotated source activity patterns along the spectrogram of the mixture [10,11] and/or that of the estimated sources [12,13]. In line with this direction, there are also speech separation systems informed, e.g., by speaker gender [14], by corresponding video [15], or by the natural language structure [16].…”