GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Black, Sid; Biderman, Stella; Hallahan, Eric; Anthony, Quentin; Gao, Leo; Golding, Laurence; He, Horace; Leahy, Connor; McDonell, Kyle; Phang, Jason; Prashanth, USVSN Sai; Purohit, Shivanshu; Reynolds, Laria; Tow, Jonathan; Wang, Ben; Weinbach, Samuel

doi:10.48550/arxiv.2204.06745

Cited by 57 publications

(73 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In 2021, a new model named Pangu-alpha (Zeng et al, 2021) was proposed, which is large-scale autoregressive language model with about 200 billion parameters, could perform various tasks extremely well under zeroshot or few-shot settings. And in 2022, an autoregressive model named GPT-NeoX-20B is introduced by Black et al, which could gain much more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models and serves as a particularly powerful few-shot reasoner (Black et al, 2022). So, in conclusion, the Auto-regressive language models could perform better in many tasks than some baselines.…”

Section: Auto-regressive Language Modelsmentioning

confidence: 99%

A Simple Survey of Pre-trained Language Models

Zhu¹

2022

Preprint

View full text Add to dashboard Cite

Pre-trained Language Models (PTLM) have remarkable and successful performance in solving lots of NLP tasks nowadays. And previous researchers have created many SOTA models and these models are included in many long surveys(Qiu et al., 2020). So, we would like to conduct a simple and short survey on this topic to help researchers understand the sketch of PTLM more quickly and comprehensively. In this short survey, we would provide a simple but comprehensive review of techniques, benchmarks, and methodologies in PTLM. And we would also introduce the applications evaluation of PTLM in this simple survey.

show abstract

Section: Auto-regressive Language Modelsmentioning

confidence: 99%

A Simple Survey of Pre-trained Language Models

Zhu¹

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Sadly, both models are proprietary and it is impossible to research them and perform more experiments. Open-source models with comparable sizes are GPT-Neo 2.7B [5], GPT-J 6B [6] and GPT-NeoX 20B [7] -all developed by EleutherAI.…”

Section: Previous Work and Motivationmentioning

confidence: 99%

“…The success of OpenAI's GPT-3 model has encouraged open-source communities to develop large pre-trained language models. For example, EleutherAI has developed several good performing autoregressive language models: GPT-Neo [5], GPT-J [6] and finally GPT-NeoX [7]. The last of these three has 20 billion parameters and performs remarkably well on several benchmarks as shown in the original paper [7].…”

Section: Introductionmentioning

confidence: 99%

“…For example, EleutherAI has developed several good performing autoregressive language models: GPT-Neo [5], GPT-J [6] and finally GPT-NeoX [7]. The last of these three has 20 billion parameters and performs remarkably well on several benchmarks as shown in the original paper [7]. The goal of this work is to test the ability of the two biggest EleutherAI models, GPT-J and GPT-NeoX, to generate Python competition-level code.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Big Transformers for Code Generation

Arutyunov¹,

Avdoshin²

2022

Proceedings of ISP RAS

View full text Add to dashboard Cite

IT industry has been thriving over the past decades. Numerous new programming languages have emerged, new architectural patterns and software development techniques. Tools involved in the process ought to evolve as well. One of the key principles of new generation of instruments for software development would be the ability of the tools to learn using neural networks. First of all, it is necessary for the tools to learn how to write code. In this work we study the ability of Transformers to generate competition level code. The main goal is to discover whether open-source Big Transformers are “naturally” good coders.

show abstract

“…: maison → house, chat → cat, chien → prompt dog completion . This capability is quite intriguing as it allows models to adapt to a wide range of downstream tasks on-thefly-i.e., without the need to perform any parameter updates after the model is trained [Brown et al, 2020, Lieber et al, 2021, Rae et al, 2021, Black et al, 2022. However, it is unclear to what extent these models have developed the ability to learn new tasks from in-context examples alone as opposed to simply indexing into a vast set of known tasks from the training data (e.g., see Min et al [2022]).…”

Section: Introductionmentioning

confidence: 99%

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

Garg¹,

Tsipras²,

Liang³

et al. 2022

Preprint

View full text Add to dashboard Cite

In-context learning refers to the ability of a model to condition on a prompt sequence consisting of in-context examples (input-output pairs corresponding to some task) along with a new query input, and generate the corresponding output. Crucially, in-context learning happens only at inference time without any parameter updates to the model. While large language models such as GPT-3 exhibit some ability to perform in-context learning, it is unclear what the relationship is between tasks on which this succeeds and what is present in the training data. To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e.g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class? We show empirically that standard Transformers can be trained from scratch to perform in-context learning of linear functions-that is, the trained model is able to learn unseen linear functions from in-context examples with performance comparable to the optimal least squares estimator. In fact, in-context learning is possible even under two forms of distribution shift: (i) between the training data of the model and inference-time prompts, and (ii) between the in-context examples and the query input during inference. We also show that we can train Transformers to in-context learn more complex function classes-namely sparse linear functions, two-layer neural networks, and decision trees-with performance that matches or exceeds task-specific learning algorithms. 1 * Equal contribution. 1 Our code and models are available at https://github.com/dtsip/in-context-learning.

show abstract

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Cited by 57 publications

References 0 publications

A Simple Survey of Pre-trained Language Models

A Simple Survey of Pre-trained Language Models

Big Transformers for Code Generation

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

Contact Info

Product

Resources

About