2022
DOI: 10.48550/arxiv.2204.06745
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Abstract: We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, we describe GPT-NeoX-20B's architecture and training and evaluate its performance on a range of language-understanding, mathematics, and knowledge-based tasks. We… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
47
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 57 publications
(73 citation statements)
references
References 0 publications
0
47
0
Order By: Relevance
“…In 2021, a new model named Pangu-alpha (Zeng et al, 2021) was proposed, which is large-scale autoregressive language model with about 200 billion parameters, could perform various tasks extremely well under zeroshot or few-shot settings. And in 2022, an autoregressive model named GPT-NeoX-20B is introduced by Black et al, which could gain much more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models and serves as a particularly powerful few-shot reasoner (Black et al, 2022). So, in conclusion, the Auto-regressive language models could perform better in many tasks than some baselines.…”
Section: Auto-regressive Language Modelsmentioning
confidence: 99%
“…In 2021, a new model named Pangu-alpha (Zeng et al, 2021) was proposed, which is large-scale autoregressive language model with about 200 billion parameters, could perform various tasks extremely well under zeroshot or few-shot settings. And in 2022, an autoregressive model named GPT-NeoX-20B is introduced by Black et al, which could gain much more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models and serves as a particularly powerful few-shot reasoner (Black et al, 2022). So, in conclusion, the Auto-regressive language models could perform better in many tasks than some baselines.…”
Section: Auto-regressive Language Modelsmentioning
confidence: 99%
“…Sadly, both models are proprietary and it is impossible to research them and perform more experiments. Open-source models with comparable sizes are GPT-Neo 2.7B [5], GPT-J 6B [6] and GPT-NeoX 20B [7] -all developed by EleutherAI.…”
Section: Previous Work and Motivationmentioning
confidence: 99%
“…The success of OpenAI's GPT-3 model has encouraged open-source communities to develop large pre-trained language models. For example, EleutherAI has developed several good performing autoregressive language models: GPT-Neo [5], GPT-J [6] and finally GPT-NeoX [7]. The last of these three has 20 billion parameters and performs remarkably well on several benchmarks as shown in the original paper [7].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…: maison → house, chat → cat, chien → prompt dog completion . This capability is quite intriguing as it allows models to adapt to a wide range of downstream tasks on-thefly-i.e., without the need to perform any parameter updates after the model is trained [Brown et al, 2020, Lieber et al, 2021, Rae et al, 2021, Black et al, 2022. However, it is unclear to what extent these models have developed the ability to learn new tasks from in-context examples alone as opposed to simply indexing into a vast set of known tasks from the training data (e.g., see Min et al [2022]).…”
Section: Introductionmentioning
confidence: 99%