Benchmarking Transformers-based models on French Spoken Language Understanding tasks

Cattan, Oralie; Ghannay, Sahar; Servan, Christophe; Rosset, Sophie

doi:10.48550/arxiv.2207.09152

Cited by 1 publication

(1 citation statement)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Pre-training Compute and CO2 Impact Our model was trained for 8 days on 6 A40 GPUs, compared to CamemBERT which was trained on 256 V100 GPUs for one day, which is roughly equivalent to 28 days of training on 6 A40 GPUs, since an NVIDIA A40 GPU is about 1.5x faster than a V100 GPU on language modeling tasks according to recent benchmarks. 8 Following the reports by Luccioni et al (2022) and Cattan et al (2022) on the environmental impact of language model training, we use Lannelongue et al's (2021) online carbon footprint calculator to provide the following estimates: CAMEM-BERTA's pre-training used 700kWh and emitted 36kg CO 2 compared to 3.32MWh and 170kg for CamemBERT. 9…”

Section: Pre-training Dataset Choicementioning

confidence: 99%

Data-Efficient French Language Modeling with CamemBERTa

Antoun,

Sagot,

Seddah

2023

Findings of the Association for Computational Linguistics: ACL 2023

View full text Add to dashboard Cite

Recent advances in NLP have significantly improved the performance of language models on a variety of tasks. While these advances are largely driven by the availability of large amounts of data and computational power, they also benefit from the development of better training methods and architectures. In this paper, we introduce CAMEMBERTA, a French DeBERTa model that builds upon the DeBER-TaV3 architecture and training objective. We evaluate our model's performance on a variety of French downstream tasks and datasets, including question answering, part-of-speech tagging, dependency parsing, named entity recognition, and the FLUE benchmark, and compare against CamemBERT, the state-of-the-art monolingual model for French. Our results show that, given the same amount of training tokens, our model outperforms BERT-based models trained with MLM on most tasks. Furthermore, our new model reaches similar or superior performance on downstream tasks compared to CamemBERT, despite being trained on only 30% of its total number of input tokens. In addition to our experimental results, we also publicly release the weights and code implementation of CAMEMBERTA, making it the first publicly available DeBERTaV3 model outside of the original paper and the first openly available implementation of a DeBERTaV3 training objective. 1

show abstract

Section: Pre-training Dataset Choicementioning

confidence: 99%