CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese

Cândido, Arnaldo; Casanova, Edresson; Soares, Anderson da Silva; Oliveira, Frederico Santos de; Oliveira, Lucas Silva de; Junior, Ricardo Corso Fernandes; Silva, Daniel Peixoto Pinto da; Fayet, Fernando Gorgulho; Carlotto, Bruno Baldissera; Gris, Lucas Rafael Stefanel; Aluísio, Sandra Maria

doi:10.48550/arxiv.2110.15731

Cited by 4 publications

(11 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This section describes a set of open corpora used to benchmark the two considered ASR systems (Section 3.1), followed by the performed process to derive a set of keywords to be spotted by the considered systems (Section 3.2), and the description of a built in domain dataset considered as well to test the considered models (Section 3.3). Debating technologies [44] Audio recordings from English transcribed public debates Polish Parliamentary corpus [45] Recordings from the Polish Polish parliament CORAA [20] Combination of five Portuguese 13 corpora in Portuguese Different public corpora were considered to train/test the ASR and KWS models. 205 Wav2vec2.0 models were fine-tuned using the Common Voice corpus [38] for each con-206 sidered language.…”

Section: Methodsmentioning

confidence: 99%

“…Particularly, the authors in [19] considered a Wav2vec2.0 model combined with their proposed language modeling approach, and achieve state-of-the-art results in the German Common Voice corpus, with a WER of 3.7%. Wav2Vec2.0-based models have also been successfully tested in more adverse acoustic environments such as in multimedia Portuguese data from the CORAA database [20]. Due to these reasons, Wav2Vec2.0 has become one of the most considered neural-based models for ASR.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Novel Speech Recognition Systems Applied to Forensics within Child Exploitation: Wav2vec2.0 vs. Whisper

Vásquez-Correa¹,

Álvarez²

2022

Preprint

View full text Add to dashboard Cite

The growth in online child exploitation material is a significant challenge for European Law Enforcement Agencies (LEAs). One of the most important sources of such online information corresponds to audio material that needs to be analyzed to find evidence in a timely and practical manner. That is why LEAs require a next-generation AI-powered platform to process audio data from online sources. We propose the use of speech recognition and keyword spotting to transcribe audiovisual data and to detect the presence of keywords related to child abuse. The considered models are based on two of the most accurate neural-based architectures to date: Wav2vec2.0 and Whisper. The systems are tested under an extensive set of scenarios in different languages. Additionally, keeping in mind that obtaining data from LEAs is very sensitive, we explore the use of federated learning to have more robust systems for the addressed application, while maintaining the privacy of the data to LEAs. The considered models achieved a word error rate between 11% and 25%, depending on the language. In addition, the systems are able to recognize a set of spotted words with true positives rates between 82% and 98%, depending on the language. Finally, federated learning strategies show that they can maintain and even improve the performance of the systems when compared to centralized trained models. The proposed systems sit the basis for an AI-powered platform for automatic analysis of audio in the context of forensic applications within child abuse. The use of federated learning is also promising for the addressed scenario, where data privacy is an important issue to be managed.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Novel Speech Recognition Systems Applied to Forensics within Child Exploitation: Wav2vec2.0 vs. Whisper

Vásquez-Correa¹,

Álvarez²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…O fine-tuning visa treinar o modelo para a tarefa de reconhecimento de fala em língua portuguesa. Por limitac ¸ões de hardware, optou-se por utilizar um modelo disponibilizado pela comunidade científica [Junior et al 2021], o qual foi treinado com aproximadamente 290 horas de áudios na língua portuguesa acompanhados de suas transcric ¸ões. O treinamento durou 20 épocas.…”

Section: Predic ¸ãO Com Wa2vec2unclassified

Avaliação de modelos para reconhecimento automático de fala aplicados para identificação da qualidade de leituras em voz alta de narrativas breves

Ferreira

Silva²,

Assis³

et al. 2022

Anais Do XXXIII Simpósio Brasileiro De Informática Na Educação (SBIE 2022)

View full text Add to dashboard Cite

Os avanços na área de reconhecimento automático de fala (ASR) tem permitido o surgimento de soluções inovadoras na área de Informática na Educação, especialmente no domínio de avaliação da alfabetização. O seu uso para reconhecimento de fala infantil, contudo, ainda traz desafios e faltam trabalhos que analisem novas tecnologias neste domínio de aplicação. Este trabalho apresenta uma comparação entre duas tecnologias de ASR no contexto de fala de crianças para a avaliação automática de fluência de leitura: uma abordagem supervisionada e uma abordagem auto-supervisionada. Foram utilizados 59 áudios de leituras de crianças. O Wav2Vec2 em conjunto com um modelo de língua apresentou resultados substancialmente melhores que os demais modelos em relação à taxa de erro de palavras.

show abstract

“…This section presents four ASR models evaluated for the task of transcribing NURC/SP audios: [Candido Junior et al 2021], [Ferreira and Oliveira 2022], [Grosman 2022], and[Stefanel Gris et al 2022]. These works are based on Wav2vec 2.0 and were trained for Portuguese speech recognition using the corpora presented in Section 2.2.2.…”

Section: Trained Models For Portuguesementioning

confidence: 99%

“…While the NURC-Recife corpus took seven years to be processed and publicly available, we hope that using speech processing tools will help to bring NURC/SP quickly to digital life. As examples of these systems, there are the automatic speech recognizer (ASR) that was chosen in this study -the model trained with the corpus CORAA ASR [Candido Junior et al 2021], the aeneas forced aligner used to synchronize audio and transcription 4 , and the forced phonetic aligner Alinha-PB 5 used in conjunction with an utterance segmentation method based on prosody.…”

Section: Introductionmentioning

confidence: 99%

Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Models

Gris

Cândido

Santos

et al. 2022

Anais Do XIX Encontro Nacional De Inteligência Artificial E Computacional (ENIAC 2022)

View full text Add to dashboard Cite

The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in São Paulo capital. Although 47 inquiries have transcripts, there was no alignment between the audio-transcription, and 328 inquiries were not transcribed. This article presents an evaluation and error analysis of three automatic speech recognition models trained with spontaneous speech in Portuguese and one model trained with prepared speech. The evaluation allowed us to choose the best model, using WER and CER metrics, in a manually aligned sample of NURC/SP, to automatically transcribe 284 hours.

show abstract

CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese

Cited by 4 publications

References 0 publications

Novel Speech Recognition Systems Applied to Forensics within Child Exploitation: Wav2vec2.0 vs. Whisper

Novel Speech Recognition Systems Applied to Forensics within Child Exploitation: Wav2vec2.0 vs. Whisper

Avaliação de modelos para reconhecimento automático de fala aplicados para identificação da qualidade de leituras em voz alta de narrativas breves

Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Models

Contact Info

Product

Resources

About