Henryk Michalewski scite author profile

Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM).We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-ofthe-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned stateof-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies. * Equal Contribution. Author contributions and ordering details are listed in Appendix A.

show abstract

Program Synthesis with Large Language Models

Austin¹,

Odena²,

Nye³

et al. 2021

Preprint

207

View full text Add to dashboard Cite

Small Valdivia compact spaces

Kubiś

Michalewski

2006

Topology and its Applications

View full text Add to dashboard Cite

We prove a preservation theorem for the class of Valdivia compact spaces, which involves inverse sequences of ``simple'' retractions. Consequently, a compact space of weight $\loe\aleph_1$ is Valdivia compact iff it is the limit of an inverse sequence of metric compacta whose bonding maps are retractions. As a corollary, we show that the class of Valdivia compacta of weight at most $\aleph_1$ is preserved both under retractions and under open 0-dimensional images. Finally, we characterize the class of all Valdivia compacta in the language of category theory, which implies that this class is preserved under all continuous weight preserving functors.Comment: 20 page

show abstract

Show Your Work: Scratchpads for Intermediate Computation with Language Models

Nye¹,

Andreassen²,

Gur-Ari³

et al. 2021

Preprint

View full text Add to dashboard Cite

Large pre-trained language models perform remarkably well on tasks that can be done "in one pass", such as generating realistic text (Brown et al., 2020) or synthesizing computer programs Austin et al., 2021). However, they struggle with tasks that require unbounded multi-step computation, such as adding integers (Brown et al., 2020) or executing programs (Austin et al., 2021). Surprisingly, we find that these same models are able to perform complex multistep computations-even in the few-shot regime-when asked to perform the operation "step by step", showing the results of intermediate computations. In particular, we train Transformers to perform multi-step computations by asking them to emit intermediate computation steps into a "scratchpad". On a series of increasingly complex tasks ranging from long addition to the execution of arbitrary programs, we show that scratchpads dramatically improve the ability of language models to perform multi-step computations.

show abstract

Learning to Run Challenge Solutions: Adapting Reinforcement Learning Methods for Neuromusculoskeletal Environments

Kidziński

Mohanty

Ong

et al. 2018

View full text Add to dashboard Cite

In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each of the eight teams implemented different modifications of the known algorithms.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.