BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge

Kocour, Martin; Cámbara, Guillermo; Luque, Jordi; Bonet, David; Farrús, Mireia; Karafiát, Martin; Veselý, Karel; Černocký, Jaň

doi:10.21437/iberspeech.2022-56

Cited by 2 publications

(2 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This submission leveraged the output of some of the ASR systems developed by the BCN2BRNO team for the Speech to Text Challenge [21]. It consisted of a primary system based on a fusion of three ASR systems (two of them based on an encoderdecoder transformer architecture: XLS-R Conformer and Whisper large model, and the third one based on an RNN transducer architecture) and a contrastive system based on the best single ASR system (XLS-R Conformer).…”

Section: Alignment and Validation Of Speech Signals With Partial And ...mentioning

confidence: 99%

“…Speech to Text ChallengeA total of 13 different systems from four participating teams were submitted. The most relevant characteristics of each system are presented in terms of the recognition engine, and audio and text data used for training acoustic and language models.• BCN2BRNO[21]. BUT Speech@FIT research group (Brno University of Technology, Czech Republic) and Telefónica Research (Spain) BCN2BRNO submitted a primary system based on a word-level ROVER fusion of five individual models.…”

mentioning

confidence: 99%

See 1 more Smart Citation

An Overview of the IberSpeech-RTVE 2022 Challenges on Speech Technologies

et al. 2023

View full text Add to dashboard Cite

Evaluation campaigns provide a common framework with which the progress of speech technologies can be effectively measured. The aim of this paper is to present a detailed overview of the IberSpeech-RTVE 2022 Challenges, which were organized as part of the IberSpeech 2022 conference under the ongoing series of Albayzin evaluation campaigns. In the 2022 edition, four challenges were launched: (1) speech-to-text transcription; (2) speaker diarization and identity assignment; (3) text and speech alignment; and (4) search on speech. Different databases that cover different domains (e.g., broadcast news, conference talks, parliament sessions) were released for those challenges. The submitted systems also cover a wide range of speech processing methods, which include hidden Markov model-based approaches, end-to-end neural network-based methods, hybrid approaches, etc. This paper describes the databases, the tasks and the performance metrics used in the four challenges. It also provides the most relevant features of the submitted systems and briefly presents and discusses the obtained results. Despite employing state-of-the-art technology, the relatively poor performance attained in some of the challenges reveals that there is still room for improvement. This encourages us to carry on with the Albayzin evaluation campaigns in the coming years.

show abstract

Section: Alignment and Validation Of Speech Signals With Partial And ...mentioning

confidence: 99%

mentioning

confidence: 99%