“…Multi-talker speech recognition is focused on recognizing individual speech sources from overlap speech, and is one main challenge for current ASR systems [1,2,3,4,5,6,7,8]. Current solutions for multi-speaker speech recognition can be categorized into two main approaches: (i) performing frontend speech processing based on separation on the overlap speech, then applying ASR to the separated speech signals [9,10,11,12,13,14,15]; or (ii) skipping the explicit separation step and developing a multi-speaker speech recognition system directly using either hybrid [16, 17, ?…”