Transcending Scaling Laws with 0.1% Extra Compute

Tay, Yi; Lee, Jason; Chung, Hyung Won; Trần, Vinh Cao; So, David R.; Shakeri, Siamak; García, Xavier; Zheng, Huaixiu; Rao, Jinfeng; Chowdhery, Aakanksha; Zhou, Denny; Metzler, Donald; Petrov, Slav; Houlsby, Neil; Le, Quoc V.; Dehghani, Mostafa

doi:10.48550/arxiv.2210.11399

Cited by 7 publications

(6 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…UL2-20B, when tested with FLAN instruction tuning, achieved a competitive score compared to FLAN-PaLM 62B on MMLU and Big-Bench benchmarks. After using the MoD objective, U-PaLM [76] achieved the same performance as PaLM-540B but with only half of its computational budget.…”

Section: Regular Denoisingmentioning

confidence: 90%

A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs)

Patil,

Gudivada

2024

Applied Sciences

View full text Add to dashboard Cite

Natural language processing (NLP) has significantly transformed in the last decade, especially in the field of language modeling. Large language models (LLMs) have achieved SOTA performances on natural language understanding (NLU) and natural language generation (NLG) tasks by learning language representation in self-supervised ways. This paper provides a comprehensive survey to capture the progression of advances in language models. In this paper, we examine the different aspects of language models, which started with a few million parameters but have reached the size of a trillion in a very short time. We also look at how these LLMs transitioned from task-specific to task-independent to task-and-language-independent architectures. This paper extensively discusses different pretraining objectives, benchmarks, and transfer learning methods used in LLMs. It also examines different finetuning and in-context learning techniques used in downstream tasks. Moreover, it explores how LLMs can perform well across many domains and datasets if sufficiently trained on a large and diverse dataset. Next, it discusses how, over time, the availability of cheap computational power and large datasets have improved LLM’s capabilities and raised new challenges. As part of our study, we also inspect LLMs from the perspective of scalability to see how their performance is affected by the model’s depth, width, and data size. Lastly, we provide an empirical comparison of existing trends and techniques and a comprehensive analysis of where the field of LLM currently stands.

show abstract

Section: Regular Denoisingmentioning

confidence: 90%

A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs)

Patil,

Gudivada

2024

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…One major improvement in the advancement of LLMs is using instruction tuning [25]. U-PaLM [26] significantly increases zero-shot performance of PaLM with only 0.1% extra compute, by applying the mixture of denoising training objective from UL2 [27] to a pretrained PaLM model. Flan-PaLM [28] further improves on that by using both instruction-tuning and chain-ofthought prompting.…”

Section: Related Workmentioning

confidence: 99%

Evaluation of Medium-large Language Models at Zero-shot Closed Book Generative Question Answering

Peinl,

Wirth

2024

Artificial Intelligence and Applications

View full text Add to dashboard Cite

Large language models (LLMs) have garnered significant attention, but the definition of “large” lacks clarity. This paper focuses on medium-sized language models (MLMs), defined as having at least six billion parameters but less than 100 billion. The study evaluates MLMs regarding zero-shot generative question answering, which requires models to provide elaborate answers without external document retrieval. The paper introduces an own test dataset and presents results from human evaluation. Results show that combining the best answers from different MLMs yielded an overall correct answer rate of 82.7% which is better than the 60.9% of ChatGPT. The best MLM achieved 71.8% and has 33B parameters, which highlights the importance of using appropriate training data for fine-tuning rather than solely relying on the number of parameters. More finegrained feedback should be used to further improve the quality of answers. The open source community is quickly closing the gap to the best commercial models.

show abstract

“…In this section, we delve deeper into the feasibility of simultaneously optimizing the two distinct objectives. Unlike existing unified pre-training frameworks [16,17,18,40,41] which employ analogous formulations to pre-train various objectives, we explore how to extend the conclusions drawn from a similar training format to a broader setting. Specifically, we investigate if a model optimized via in-place token predictions benefits the one trained via the next-token prediction regime, and vice versa.…”

Section: The Empirical Analysis Of Unified Trainingmentioning

confidence: 99%

xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein

Chen,

Cheng,

et al. 2023

Preprint

View full text Add to dashboard Cite

Protein language models have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pre-training objectives, which makes them struggle to handle protein understanding and generation tasks concurrently. This paper proposes a unified protein language model, xTrimoPGLM, to address these two types of tasks simultaneously through an innovative pre-training framework. Our key technical contribution is an exploration of the compatibility and the potential for joint optimization of the two types of objectives, which has led to a strategy for training xTrimoPGLM at an unprecedented scale of 100 billion parameters and 1 trillion training tokens. Our extensive experiments reveal that xTrimoPGLM significantly outperforms other advanced baselines in diverse protein understanding tasks (13 out of 15 tasks across four categories) and generates novel protein sequences which are structurally similar to natural ones. Furthermore, using the same xTrimoPGLM framework, we train an antibody-specific model (xTrimoPGLM-Ab) using 1 billion parameters. This model set a new record in predicting antibody naturalness and structures, both essential to the field of antibody-based drug design, and demonstrated a significantly faster inference speed than AlphaFold2. These results highlight the substantial capability and versatility of xTrimoPGLM in understanding and generating protein sequences.

show abstract

Transcending Scaling Laws with 0.1% Extra Compute

Cited by 7 publications

References 0 publications

A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs)

A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs)

Evaluation of Medium-large Language Models at Zero-shot Closed Book Generative Question Answering

xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein

Contact Info

Product

Resources

About