“…While promising results were shown, all the previous studies were limited to either simulated data [16,17,18,19,20,21,22,24,25,26,27] or small-scale real data [7,9,8,10]. It is due to the difficulty in collecting real meeting recordings with precise transcriptions at large-scale.…”