Training Compute-Optimal Large Language Models

Hoffmann, Jordan; Borgeaud, Sebastian; Arthur, Michel; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes; Clark, Aidan; Hennigan, Tom; Noland, Eric; Millican, Katie; Driessche, George van den; Damoc, Bogdan; Guy, Aurelia; Osindero, Simon; Simonyan, Karen; Elsen, Erich; Rae, Jack W.; Vinyals, Oriol; Sifre, Laurent

doi:10.48550/arxiv.2203.15556

Cited by 167 publications

(256 citation statements)

References 25 publications

Supporting

Mentioning

159

Contrasting

Order By: Relevance

“…This trend has been justified by the findings of Kaplan et al (2020), who show that language modelling performance is strongly correlated with model size. Recently, Hoffmann et al (2022) have refined these findings, showing that the number of data tokens should scale at the same rate as the model size to maximise computational efficiency. Based on these findings, they introduced the Chinchilla family of models, which we build upon, using the 70B parameter Chinchilla model as the base LM for our largest Flamingo model.…”

Section: Language Modellingmentioning

confidence: 96%

“…(b) Architectural innovations and training strategies that effectively leverage large pretrained vision-only and language-only models, preserving the benefits of these initial models while efficiently fusing the modalities. Starting from Chinchilla, a 70B state-of-the-art LM (Hoffmann et al, 2022), we train Flamingo, an 80B parameter VLM. (c) Efficient ways to adapt to visual inputs of varying size, making Flamingo applicable to images and videos.…”

Section: Contributionsmentioning

confidence: 99%

“…The role of this model is to extract semantic spatial features that describe attributes that one would want to query about a visual datum: color, shape, nature, positions of objects, etc. On the language side, we start from an existing autoregressive language model (LM) trained on a large and diverse text corpus (Hoffmann et al, 2022). By doing so, Flamingo models gain strong generative language abilities and access to a large amount of knowledge stored in the LM weights.…”

Section: Approachmentioning

confidence: 99%

“…Another approach for preventing catastrophic forgetting is to co-train on MassiveText (Rae et al, 2021), the dataset that was used to pretrain the language model. Specifically, we add MassiveText to the training mixture, with a weight 𝜆 𝑚 of 1.0 (best performing after a small grid search), using a sequence length of 2048 and the exact same setting as the pretraining of Chinchilla (Hoffmann et al, 2022) for computing the text-only training loss. In order to co-train on MassiveText, we need to unfreeze the language model but we keep the vision encoder frozen.…”

Section: Language Model Pretrainingmentioning

confidence: 99%

“…To achieve this, Flamingo takes inspiration from recent work in large-scale generative language models (LMs) which are good few-shot learners (Brown et al, 2020;Chowdhery et al, 2022;Hoffmann et al, 2022;Rae et al, 2021). A single large LM can indeed achieve strong performance on many tasks using only its text interface: a few examples of a task are provided to the model as a prompt, along with a query input, and the model generates a continuation to produce a predicted output for the task on that query.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Flamingo: a Visual Language Model for Few-Shot Learning

Alayrac¹,

Donahue²,

Luc³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

ordered alphabetically, † Equal contributions, ordered alphabetically, ‡ Equal senior contributions Building models that can be rapidly adapted to numerous tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. Flamingo models include key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot learning capabilities. We perform a thorough evaluation of the proposed Flamingo models, exploring and measuring their ability to rapidly adapt to a variety of image and video understanding benchmarks. These include open-ended tasks such as visual question-answering, where the model is prompted with a question which it has to answer, captioning tasks, which evaluate the ability to describe a scene or an event, and close-ended tasks such as multiple choice visual question-answering. For tasks lying anywhere on this spectrum, we demonstrate that a single Flamingo model can achieve a new state of the art for few-shot learning, simply by prompting the model with task-specific examples. On many of these benchmarks, Flamingo actually surpasses the performance of models that are fine-tuned on thousands of times more task-specific data.

show abstract

Section: Language Modellingmentioning

confidence: 96%

Section: Contributionsmentioning

confidence: 99%

Section: Approachmentioning

confidence: 99%

Section: Language Model Pretrainingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Flamingo: a Visual Language Model for Few-Shot Learning

Alayrac¹,

Donahue²,

Luc³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Performance of Large Language Models on Medical Oncology Examination Questions

Longwell,

Hirsch,

Binder

et al. 2024

JAMA Netw Open

View full text Add to dashboard Cite

ImportanceLarge language models (LLMs) recently developed an unprecedented ability to answer questions. Studies of LLMs from other fields may not generalize to medical oncology, a high-stakes clinical setting requiring rapid integration of new information.ObjectiveTo evaluate the accuracy and safety of LLM answers on medical oncology examination questions.Design, Setting, and ParticipantsThis cross-sectional study was conducted between May 28 and October 11, 2023. The American Society of Clinical Oncology (ASCO) Oncology Self-Assessment Series on ASCO Connection, the European Society of Medical Oncology (ESMO) Examination Trial questions, and an original set of board-style medical oncology multiple-choice questions were presented to 8 LLMs.Main Outcomes and MeasuresThe primary outcome was the percentage of correct answers. Medical oncologists evaluated the explanations provided by the best LLM for accuracy, classified the types of errors, and estimated the likelihood and extent of potential clinical harm.ResultsProprietary LLM 2 correctly answered 125 of 147 questions (85.0%; 95% CI, 78.2%-90.4%; P &lt; .001 vs random answering). Proprietary LLM 2 outperformed an earlier version, proprietary LLM 1, which correctly answered 89 of 147 questions (60.5%; 95% CI, 52.2%-68.5%; P &lt; .001), and the best open-source LLM, Mixtral-8x7B-v0.1, which correctly answered 87 of 147 questions (59.2%; 95% CI, 50.0%-66.4%; P &lt; .001). The explanations provided by proprietary LLM 2 contained no or minor errors for 138 of 147 questions (93.9%; 95% CI, 88.7%-97.2%). Incorrect responses were most commonly associated with errors in information retrieval, particularly with recent publications, followed by erroneous reasoning and reading comprehension. If acted upon in clinical practice, 18 of 22 incorrect answers (81.8%; 95% CI, 59.7%-94.8%) would have a medium or high likelihood of moderate to severe harm.Conclusions and RelevanceIn this cross-sectional study of the performance of LLMs on medical oncology examination questions, the best LLM answered questions with remarkable performance, although errors raised safety concerns. These results demonstrated an opportunity to develop and evaluate LLMs to improve health care clinician experiences and patient care, considering the potential impact on capabilities and safety.

show abstract

Geometric Tuning of Single‐Atom FeN₄ Sites via Edge‐Generation Enhances Multi‐Enzymatic Properties

et al. 2023

View full text Add to dashboard Cite

Single‐atom nanozymes (SAzymes) are considered promising alternatives to natural enzymes. The catalytic performance of SAzymes featuring homogeneous, well‐defined active structures can be enhanced through elucidating structure‐activity relationship and tailoring physicochemical properties. However, manipulating enzymatic properties through structural variation is an underdeveloped approach. Herein, the synthesis of edge‐rich Fe single‐atom nanozymes (FeNC‐edge) via an H2O2‐mediated edge generation is reported. By controlling the number of edge sites, the peroxidase (POD)‐ and oxidase (OXD)‐like performance is significantly enhanced. The activity enhancement results from the presence of abundant edges, which provide new anchoring sites to mononuclear Fe. Experimental results combined with density functional theory (DFT) calculations reveal that FeN4 moieties in the edge sites display high electron density of Fe atoms and open N atoms. Finally, it is demonstrated that FeNC‐edge nanozyme effectively inhibits tumor growth both in vitro and in vivo, suggesting that edge‐tailoring is an efficient strategy for developing artificial enzymes as novel catalytic therapeutics.

show abstract

Training Compute-Optimal Large Language Models

Cited by 167 publications

References 25 publications

Flamingo: a Visual Language Model for Few-Shot Learning

Flamingo: a Visual Language Model for Few-Shot Learning

Performance of Large Language Models on Medical Oncology Examination Questions

Geometric Tuning of Single‐Atom FeN₄ Sites via Edge‐Generation Enhances Multi‐Enzymatic Properties

Contact Info

Product

Resources

About

Training Compute-Optimal Large Language Models

Cited by 167 publications

References 25 publications

Flamingo: a Visual Language Model for Few-Shot Learning

Flamingo: a Visual Language Model for Few-Shot Learning

Performance of Large Language Models on Medical Oncology Examination Questions

Geometric Tuning of Single‐Atom FeN4 Sites via Edge‐Generation Enhances Multi‐Enzymatic Properties

Contact Info

Product

Resources

About

Geometric Tuning of Single‐Atom FeN₄ Sites via Edge‐Generation Enhances Multi‐Enzymatic Properties