“…The noise and reverberation levels are frequently selected independently from each other without a specific application in mind, e.g., [20,21], therefore some of them might never happen in real life. Furthermore, they are often selected within a discrete set of values, e.g., [11,14,[22][23][24] or a narrow range of values, e.g., [25], which does not match the actual distribution of levels observed in real life and artificially advantages learning-based methods which may overfit those levels. Even when the distortion levels are realistic, there may still exist some acoustic mismatch, due to recording speech in a different place than noise and reverberation, e.g., [26,27].…”