“…The RIRs are generated using the image method [23]. Three sizes of near-end and far-end rooms are selected, which are 4, 3, 3 m, [6,4,3] m, and [8,7,3] m. The Reverberation time is set to be 0.3 s, 0.6 s, and 0.9 s. The distance between loudspeakers microphones are set to be 2.0 m and 0.4 m, respectively. The distance between each speaker position and the center of microphones is set to be [0.3, 0.7, 1.1]0.7 m. The near-end speech is mixed with the echo signals at a signal-to-echo ratio (SER) randomly chosen from [0, 5, 10, 15] dB.…”