The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

Goyal, Naman; Gao, Cynthia; Chaudhary, Vishrav; Chen, Peng‐Jen; Wenzek, Guillaume; Ju, Da Young; Krishnan, Sanjana; Ranzato, Marc’Aurelio; Guzmán, Francisco; Fan, Angela

doi:10.48550/arxiv.2106.03193

Cited by 5 publications

(6 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For source transcripts, we reuse the transcripts produced by human translators from [20]. We maintain the English translated transcripts, which are useful for tasks such as multi-modal speech translation evaluations.…”

Section: Textual Datamentioning

confidence: 99%

“…In machine translation, the release of new benchmarks like FLoRes-101 [20] has enabled advances in publicly available massively multilingual machine translation systems [21]. With FLEURS, we hope to provide a resource that could catalyze research towards building massively multilingual speech and text representations and their evaluation on a variety of tasks.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

Conneau¹,

Ma²,

Khanuja³

et al. 2022

Preprint

View full text Add to dashboard Cite

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Language Identification (Speech LangID), Translation and Retrieval. In this paper, we provide baselines for the tasks based on multilingual pre-trained models like mSLAM. The goal of FLEURS is to enable speech technology in more languages and catalyze research in low-resource speech understanding. 1 2 Note: For clarity we have renamed FLoRes "Chinese (Simp)" to "Mandarin Chinese" (code "cmn") and "Chinese (Trad)" to "Cantonese Chinese" (code "yue").

show abstract

Section: Textual Datamentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

Conneau¹,

Ma²,

Khanuja³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…At first, we translate the ChAII dataset from Hindi and Tamil to English and then to Bengali, Marathi, Malayalam, and Telugu. In the FLORES devset benchmark (Goyal et al, 2021), the BLEU scores of IndicTrans for translating Hindi and Tamil to English are 37.9 and 28.6, respectively. The scores (Radford et al, 2021) for translating English to Bengali, Marathi, Malayalam,and Telugu are 20.3,16.1,16.3,and 22.0, respectively.…”

Section: Translation and Transliteration Detailsmentioning

confidence: 99%

MuCoT: Multilingual Contrastive Training for Question-Answering in Low-resource Languages

Karthik¹,

Gehlot²,

Mullappilly³

et al. 2022

Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

View full text Add to dashboard Cite

Accuracy of English-language Question Answering (QA) systems has improved significantly in recent years with the advent of Transformer-based models (e.g., BERT). These models are pre-trained in a self-supervised fashion with a large English text corpus and further fine-tuned with a massive English QA dataset (e.g., SQuAD). However, QA datasets on such a scale are not available for most of the other languages. Multi-lingual BERT-based models (mBERT) are often used to transfer knowledge from high-resource languages to low-resource languages. Since these models are pre-trained with huge text corpora containing multiple languages, they typically learn language-agnostic embeddings for tokens from different languages. However, directly training an mBERT-based QA system for low-resource languages is challenging due to the paucity of training data. In this work, we augment the QA samples of the target language using translation and transliteration into other languages and use the augmented data to finetune an mBERT-based QA model, which is already pre-trained in English. Experiments on the Google ChAII dataset show that finetuning the mBERT model with translations from the same language family boosts the question-answering performance, whereas the performance degrades in the case of crosslanguage families. We further show that introducing a contrastive loss between the translated question-context feature pairs during the finetuning process, prevents such degradation with cross-lingual family translations and leads to marginal improvement. The code for this work is available at https://github.com/g okulkarthik/mucot.

show abstract

“…Accompanied by the increase of publicly released parallel corpus such as FLORES-101 (Goyal et al, 2021) and AI hub, the importance of evaluating and improving the quality of the parallel corpus becomes higher. Especially for the data construction process, assessing the quality of the corpus is regarded as an essential process.…”

Section: Parallel Corpus Quality Assessmentmentioning

confidence: 99%

Empirical Analysis of Korean Public AI Hub Parallel Corpora and in-depth Analysis using LIWC

Park¹,

Shim²,

Eo³

et al. 2021

Preprint

View full text Add to dashboard Cite

Machine translation (MT) system aims to translate source language into target language. Recent studies on MT systems mainly focus on neural machine translation (NMT). One factor that significantly affects the performance of NMT is the availability of high-quality parallel corpora. However, high-quality parallel corpora concerning Korean are relatively scarce compared to those associated with other highresource languages, such as German or Italian. To address this problem, AI Hub recently released seven types of parallel corpora for Korean. In this study, we conduct an in-depth verification of the quality of corresponding parallel corpora through Linguistic Inquiry and Word Count (LIWC) and several relevant experiments. LIWC is a word-counting software program that can analyze corpora in multiple ways and extract linguistic features as a dictionary base. To the best of our knowledge, this study is the first to use LIWC to analyze parallel corpora in the field of NMT. Our findings suggest the direction of further research toward obtaining the improved quality parallel corpora through our correlation analysis in LIWC and NMT performance.

show abstract

The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

Cited by 5 publications

References 26 publications

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

MuCoT: Multilingual Contrastive Training for Question-Answering in Low-resource Languages

Empirical Analysis of Korean Public AI Hub Parallel Corpora and in-depth Analysis using LIWC

Contact Info

Product

Resources

About