2021
DOI: 10.48550/arxiv.2109.04650
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

Abstract: GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce Hy-perCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. Enhanced by our Korean-specific toke… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(12 citation statements)
references
References 34 publications
(29 reference statements)
0
12
0
Order By: Relevance
“…18 About one year after GPT-3 was announced, a spike in similar model announcements followed. These models were developed by both large and small private organizations from around the world: Jurassic-1-Jumbo [46], AI21 Labs, Israel; Ernie 3.0 Titan [70], Baidu, China; Gopher [56], DeepMind, USA/UK; FLAN [71] & LaMDA [68], Google, USA; Pan Gu [78] Huawei, China; Yuan 1.0 [76], Inspur, China; Megatron Turing NLG [64], Microsoft & NVIDIA, USA; and HyperClova [43], Naver, Korea. This suggests that the economic incentives to build such models, and the prestige incentives to announce them, are quite strong.…”
Section: Large Language Models Are Rapidly Proliferatingmentioning
confidence: 99%
See 1 more Smart Citation
“…18 About one year after GPT-3 was announced, a spike in similar model announcements followed. These models were developed by both large and small private organizations from around the world: Jurassic-1-Jumbo [46], AI21 Labs, Israel; Ernie 3.0 Titan [70], Baidu, China; Gopher [56], DeepMind, USA/UK; FLAN [71] & LaMDA [68], Google, USA; Pan Gu [78] Huawei, China; Yuan 1.0 [76], Inspur, China; Megatron Turing NLG [64], Microsoft & NVIDIA, USA; and HyperClova [43], Naver, Korea. This suggests that the economic incentives to build such models, and the prestige incentives to announce them, are quite strong.…”
Section: Large Language Models Are Rapidly Proliferatingmentioning
confidence: 99%
“…Scaling up the amount of data, compute power, and model parameters of neural networks has recently led to the arrival (and real world deployment) of capable generative models such as CLIP [55], Ernie 3.0 Titan [70], FLAN [71], Gopher [56], GPT-3 [11], HyperClova [43], Jurassic-1-Jumbo [46], Megatron Turing NLG [64], LaMDA [68], Pan Gu [78], Yuan 1.0 [76], and more. For this class of models 4 the relationship between scale and model performance is often so predictable that it can be described in a lawful relationship -a scaling law.…”
Section: Introductionmentioning
confidence: 99%
“…Their follow-up study confirmed that the memorization capacity of LMs had a log-linear relationship with the model size [8]. Considering the current circumstance that GPT-based architectures are widely adopted as core engines in real applications [21], [22], the MI attacks against LMs are a substantial threat.…”
Section: Introductionmentioning
confidence: 86%
“…Some common examples of GenAI systems are image generators (Midjourney or stable diffusion), Chatbots (ChatGPT, Bard, Palm), code generators (CodeX, Co-Pilot [133]) audio generators(VALL-E)Valle [134], and video generators (Gen-2) [135] During the past few years, GenAI models size has been scaled from a few million parameters(BERT [48], 110M) to hundreds of billions of parameters (GPT [136], 175B). Generally speaking, as the size of the model (number of parameters) increases, the performance of the model also increases [137], and it can be generalized for a variety of tasks [138], for example, Foundation models [139]. However, smaller models can also be fine-tuned for a more focused task [140].…”
Section: Generative Aimentioning
confidence: 99%