2021
DOI: 10.48550/arxiv.2109.07684
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Language Models are Few-shot Multilingual Learners

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…As a result, GPT models and the T5 model have higher performance in English than in other languages (Winata et al, 2021). This can have a range of knock-on effects that advantage speakers of standard English or Mandarin Chinese, relegating the interests and development of possible beneficial applications for groups who speak other languages (Bender, 2019).…”
Section: Problemmentioning
confidence: 99%
See 1 more Smart Citation
“…As a result, GPT models and the T5 model have higher performance in English than in other languages (Winata et al, 2021). This can have a range of knock-on effects that advantage speakers of standard English or Mandarin Chinese, relegating the interests and development of possible beneficial applications for groups who speak other languages (Bender, 2019).…”
Section: Problemmentioning
confidence: 99%
“…Current state-of-the-art LMs produce higher quality predictions when prompted in English or Mandarin Chinese (Brown et al, 2020;Du, 2021;Fedus et al, 2021;Rosset, 2020). While it has been shown that in some languages, few-shot training and fine-tuning can improve performance in GPT models (Brown et al, 2020) and the T5 model (Raffel et al, 2020), the performance in non-English languages remained lower than the performance in English (Winata et al, 2021). It may be the case that the architecture of current LMs is particularly well-suited to English, and less well suited to other languages (Bender, 2011;Hovy and Spruit, 2016;Ruder, 2020).…”
Section: Examplesmentioning
confidence: 99%
“…Scaling language models using this architecture has proven to be a powerful and general strategy for improving generalization. This has led to the emergence of multi-task (Radford et al, 2019) and few-shot (Brown et al, 2020;Winata et al, 2021) models leveraging scale and compute (Sanh et al, 2021;Raffel et al, 2020). This plot compares the performance of three different models with different sizes (Text+Chem T5-base, Text+Chem T5-small, MolT5base, MolT5-small, T5-base, and T5-small) on the task of converting SMILES to captions, using six different metrics: BLUE-2, BLEU-4, Rouge-1, Rouge-2, Rouge-L, and Meteor.…”
Section: Introductionmentioning
confidence: 99%