The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

Chen, Shen; Liu, Yi; Fan, Wenzhi; Wang, Bin; Wen, Shixue; Tian, Ye; Zhang, Jun; Yang, Jian; Ma, Zhuo

doi:10.48550/arxiv.2202.04261

Cited by 1 publication

(1 citation statement)

References 20 publications

(28 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Noise augmentation, reverberation simulation, speed perturbation and SpecAugmentation are the mainstream methods with stable performance improvement. According to the report provided by second-place team B24 [55], relative CER reduction of 13.5% can be achieved by multi-channel multi-speaker data simulation as compared with the baseline trained using Train-Ali-far. Compared with speaker diarization, data simulation for multi-speaker ASR is more complex, which needs to consider various factors such as speaker turn and conversation duration.…”

Section: Data Augmentationmentioning

confidence: 99%

Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge

Yang¹,

Zhang²,

Guo³

et al. 2022

Preprint

View full text Add to dashboard Cite

The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand Challenge (M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech technologies. The M2MeT challenge has particularly set up two tracks, speaker diarization (track 1) and multi-speaker automatic speech recognition (ASR) (track 2). Along with the challenge, we released 120 hours of real-recorded Mandarin meeting speech data with manual annotation, including far-field data collected by 8-channel microphone array as well as near-field data collected by each participants' headset microphone. We briefly describe the released dataset, track setups, baselines and summarize the challenge results and major techniques used in the submissions.

show abstract

Section: Data Augmentationmentioning

confidence: 99%