2022
DOI: 10.48550/arxiv.2204.07580
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

mGPT: Few-Shot Learners Go Multilingual

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 0 publications
0
7
0
Order By: Relevance
“…Additionally, to facilitate a comprehensive comparison, we computed "word surprisal" for the stimuli text for the corpus of eye-tracking data on Chinese reading (Zhang et al, 2022). The two state-of-the-art large language models (LLMs) were taken to estimate word surprisal: Chinese BERT (Cui et al, 2021) and multilingual GPT (Shliazhko et al, 2022). We can leverage pre-trained LLMs to estimate word probabilities and subsequently obtain word surprisals.…”
Section: A Attention Types In Transformersmentioning
confidence: 99%
“…Additionally, to facilitate a comprehensive comparison, we computed "word surprisal" for the stimuli text for the corpus of eye-tracking data on Chinese reading (Zhang et al, 2022). The two state-of-the-art large language models (LLMs) were taken to estimate word surprisal: Chinese BERT (Cui et al, 2021) and multilingual GPT (Shliazhko et al, 2022). We can leverage pre-trained LLMs to estimate word probabilities and subsequently obtain word surprisals.…”
Section: A Attention Types In Transformersmentioning
confidence: 99%
“…Fortunately, such databases and computational tools are available. For instance, multi-lingual large language models (LLMs, e.g., m-BERT, Devlin et al (2018); XLM, (Conneau et al, 2019); mGPT, (Shliazhko et al, 2022)) are capable of processing and understanding text in multiple languages. Multilingual LLMs can estimate word surprisal precisely respectively for multiple languages.…”
Section: Introductionmentioning
confidence: 99%
“…In March 2022, Google's DeepMind released Chinchilla with 70 Bp [57], BigScience published tr11-176-ml (only available on the github repository). April brought along PaLM with 540 Bp by Google [83], CodeGen using 16 Bp by Salesforce, VLM-4 by Salesforce [88], the 200 Bp Luminous by Aleph Alpha [52], the 13 Bp mGPT from Sber [107], and the 10 Bp Noor from TII [114]. The most groundbreaking model in April was called Flamingo produced by Google's DeepMind because it combined a 70 Bp language model with a 10 Mb image model [3].…”
mentioning
confidence: 99%