2022
DOI: 10.48550/arxiv.2202.04261
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

Abstract: This paper describes our submission to ICASSP 2022 Multi-channel Multi-party Meeting Transcription (M2MeT) Challenge. For Track 1, we propose several approaches to empower the clustering-based speaker diarization system to handle overlapped speech. Front-end dereverberation and the direction-of-arrival (DOA) estimation are used to improve the accuracy of speaker diarization. Multi-channel combination and overlap detection are applied to reduce the missed speaker error. A modified DOVER-Lap is also proposed to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 20 publications
(28 reference statements)
0
1
0
Order By: Relevance
“…Noise augmentation, reverberation simulation, speed perturbation and SpecAugmentation are the mainstream methods with stable performance improvement. According to the report provided by second-place team B24 [55], relative CER reduction of 13.5% can be achieved by multi-channel multi-speaker data simulation as compared with the baseline trained using Train-Ali-far. Compared with speaker diarization, data simulation for multi-speaker ASR is more complex, which needs to consider various factors such as speaker turn and conversation duration.…”
Section: Data Augmentationmentioning
confidence: 99%
“…Noise augmentation, reverberation simulation, speed perturbation and SpecAugmentation are the mainstream methods with stable performance improvement. According to the report provided by second-place team B24 [55], relative CER reduction of 13.5% can be achieved by multi-channel multi-speaker data simulation as compared with the baseline trained using Train-Ali-far. Compared with speaker diarization, data simulation for multi-speaker ASR is more complex, which needs to consider various factors such as speaker turn and conversation duration.…”
Section: Data Augmentationmentioning
confidence: 99%