Adam Roberts scite author profile

Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM).We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-ofthe-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned stateof-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies. * Equal Contribution. Author contributions and ordering details are listed in Appendix A.

show abstract

LaMDA: Language Models for Dialog Applications

Thoppilan¹,

Freitas²,

Hall³

et al. 2022

Preprint

133

187

View full text Add to dashboard Cite

We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformerbased neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of safety and factual grounding. The first challenge, safety, involves ensuring that the model's responses are consistent with a set of human values, such as preventing harmful suggestions and unfair bias. We quantify safety using a metric based on an illustrative set of human values, and we find that filtering candidate responses using a LaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promising approach to improving model safety. The second challenge, factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator. We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible. Finally, we explore the use of LaMDA in the domains of education and content recommendations, and analyze their helpfulness and role consistency. * Work done while at Google.

show abstract

Extracting Training Data from Large Language Models

Carlini¹,

Tramèr²,

Wallace³

et al. 2020

Preprint

View full text Add to dashboard Cite

It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model. We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.We comprehensively evaluate our extraction attack to understand the factors that contribute to its success. For example, we find that larger models are more vulnerable than smaller models. We conclude by drawing lessons and discussing possible safeguards for training large language models.

show abstract

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

Roberts¹,

Raffel²,

Shazeer³

2020

Preprint

View full text Add to dashboard Cite

It has recently been observed that neural language models trained on unstructured text can implicitly store and retrieve knowledge using natural language queries. In this short paper, we measure the practical utility of this approach by fine-tuning pre-trained models to answer questions without access to any external context or knowledge. We show that this approach scales surprisingly well with model size and outperforms models that explicitly look up knowledge on the open-domain variants of Natural Questions and WebQuestions.

show abstract

Current tidal power technologies and their suitability for applications in coastal and marine areas

Roberts

Thomas

Sewell

et al. 2016

J. Ocean Eng. Mar. Energy

109

View full text Add to dashboard Cite

A considerable body of research is currently being performed to quantify available tidal energy resources and to develop efficient devices with which to harness them. This work is naturally focussed on maximising power generation from the most promising sites, and a review of the literature suggests that the potential for smaller scale, local tidal power generation from shallow near-shore sites has not yet been investigated. If such generation is feasible, it could have the potential to provide sustainable electricity for coastal homes and communities as part of a distributed generation strategy, and would benefit from easier installation and maintenance, lower cabling and infrastructure requirements and reduced capital costs when compared with larger scale projects. This article reviews tidal barrages and lagoons, tidal turbines, oscillating hydrofoils and tidal kites to assess their suitability for smaller scale electricity generation in the shallower waters of coastal areas at the design stage. This is achieved by discussing the power density, scalability, durability, maintainability, economic potential and environmental impacts of each concept. The discussion suggests that tidal kites and range devices are not well suited toward small-scale shallow water applications due to depth and size requirements, respectively. Cross-flow turbines appear to be the most suitable technology, as they have high power densities and a maximum size that is not constrained by water depth. Oscillating hydrofoils would also be appropriate, provided comparable levels of efficiency can be achieved.

show abstract

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Xue¹,

Barua²,

Constant³

et al. 2021

Preprint

View full text Add to dashboard Cite

Do Transformer Modifications Transfer Across Implementations and Applications?

Narang¹,

Chung²,

Tay³

et al. 2021

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Adam Roberts

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

PaLM: Scaling Language Modeling with Pathways

LaMDA: Language Models for Dialog Applications

Extracting Training Data from Large Language Models

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

Current tidal power technologies and their suitability for applications in coastal and marine areas

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Do Transformer Modifications Transfer Across Implementations and Applications?

Contact Info

Product

Resources

About