2020
DOI: 10.48550/arxiv.2004.09249
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
38
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
8
2

Relationship

2
8

Authors

Journals

citations
Cited by 50 publications
(39 citation statements)
references
References 35 publications
0
38
0
Order By: Relevance
“…However, speech data is notoriously difficult to work with for machine learning practitioners. Recordings of speech come in many flavors: as isolated utterances in separate files (e.g., LibriSpeech [13]); long, continuous recordings of podcasts and conversations (e.g., GigaSpeech [7]); or even multi-channel recordings from multiple microphone arrays (e.g., AMI [10], CHiME-6 [18]). Audio is encoded with a variety of codecs, both common (e.g., PCM, FLAC, OPUS) and obscure (e.g., sphere, shorten).…”
Section: Introductionmentioning
confidence: 99%
“…However, speech data is notoriously difficult to work with for machine learning practitioners. Recordings of speech come in many flavors: as isolated utterances in separate files (e.g., LibriSpeech [13]); long, continuous recordings of podcasts and conversations (e.g., GigaSpeech [7]); or even multi-channel recordings from multiple microphone arrays (e.g., AMI [10], CHiME-6 [18]). Audio is encoded with a variety of codecs, both common (e.g., PCM, FLAC, OPUS) and obscure (e.g., sphere, shorten).…”
Section: Introductionmentioning
confidence: 99%
“…Multi-talker speech recognition is focused on recognizing individual speech sources from overlap speech, and is one main challenge for current ASR systems [1,2,3,4,5,6,7,8]. Current solutions for multi-speaker speech recognition can be categorized into two main approaches: (i) performing frontend speech processing based on separation on the overlap speech, then applying ASR to the separated speech signals [9,10,11,12,13,14,15]; or (ii) skipping the explicit separation step and developing a multi-speaker speech recognition system directly using either hybrid [16, 17, ?…”
Section: Introductionmentioning
confidence: 99%
“…However, current end-to-end approaches have been reported to be strongly overfitted to the environments that they are trained for, not generalising to diverse real-world conditions. Therefore, the winning entries to recent diarisation challenges [9][10][11] are based on the former method, and this will also be the focus of this paper.…”
Section: Introductionmentioning
confidence: 99%