2021
DOI: 10.48550/arxiv.2110.15731
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0
1

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(11 citation statements)
references
References 0 publications
0
10
0
1
Order By: Relevance
“…This section describes a set of open corpora used to benchmark the two considered ASR systems (Section 3.1), followed by the performed process to derive a set of keywords to be spotted by the considered systems (Section 3.2), and the description of a built in domain dataset considered as well to test the considered models (Section 3.3). Debating technologies [44] Audio recordings from English transcribed public debates Polish Parliamentary corpus [45] Recordings from the Polish Polish parliament CORAA [20] Combination of five Portuguese 13 corpora in Portuguese Different public corpora were considered to train/test the ASR and KWS models. 205 Wav2vec2.0 models were fine-tuned using the Common Voice corpus [38] for each con-206 sidered language.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…This section describes a set of open corpora used to benchmark the two considered ASR systems (Section 3.1), followed by the performed process to derive a set of keywords to be spotted by the considered systems (Section 3.2), and the description of a built in domain dataset considered as well to test the considered models (Section 3.3). Debating technologies [44] Audio recordings from English transcribed public debates Polish Parliamentary corpus [45] Recordings from the Polish Polish parliament CORAA [20] Combination of five Portuguese 13 corpora in Portuguese Different public corpora were considered to train/test the ASR and KWS models. 205 Wav2vec2.0 models were fine-tuned using the Common Voice corpus [38] for each con-206 sidered language.…”
Section: Methodsmentioning
confidence: 99%
“…Particularly, the authors in [19] considered a Wav2vec2.0 model combined with their proposed language modeling approach, and achieve state-of-the-art results in the German Common Voice corpus, with a WER of 3.7%. Wav2Vec2.0-based models have also been successfully tested in more adverse acoustic environments such as in multimedia Portuguese data from the CORAA database [20]. Due to these reasons, Wav2Vec2.0 has become one of the most considered neural-based models for ASR.…”
Section: Introductionmentioning
confidence: 99%
“…O fine-tuning visa treinar o modelo para a tarefa de reconhecimento de fala em língua portuguesa. Por limitac ¸ões de hardware, optou-se por utilizar um modelo disponibilizado pela comunidade científica [Junior et al 2021], o qual foi treinado com aproximadamente 290 horas de áudios na língua portuguesa acompanhados de suas transcric ¸ões. O treinamento durou 20 épocas.…”
Section: Predic ¸ãO Com Wa2vec2unclassified
“…This section presents four ASR models evaluated for the task of transcribing NURC/SP audios: [Candido Junior et al 2021], [Ferreira and Oliveira 2022], [Grosman 2022], and[Stefanel Gris et al 2022]. These works are based on Wav2vec 2.0 and were trained for Portuguese speech recognition using the corpora presented in Section 2.2.2.…”
Section: Trained Models For Portuguesementioning
confidence: 99%
“…While the NURC-Recife corpus took seven years to be processed and publicly available, we hope that using speech processing tools will help to bring NURC/SP quickly to digital life. As examples of these systems, there are the automatic speech recognizer (ASR) that was chosen in this study -the model trained with the corpus CORAA ASR [Candido Junior et al 2021], the aeneas forced aligner used to synchronize audio and transcription 4 , and the forced phonetic aligner Alinha-PB 5 used in conjunction with an utterance segmentation method based on prosody.…”
Section: Introductionmentioning
confidence: 99%