Ofir Press scite author profile

We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.

show abstract

Additive Manufacturing of Transparent Silica Glass from Solutions

Cooperstein

Shukrun

Press

et al. 2018

ACS Appl. Mater. Interfaces

112

View full text Add to dashboard Cite

A sol, aqueous solution-based ink is presented for fabrication of 3D transparent silica glass objects with complex geometries, by a simple 3D printing process conducted at room temperature. The ink combines a hybrid ceramic precursor that can undergo both the photopolymerization reaction and a sol-gel process, both in the solution form, without any particles. The printing is conducted by localized photopolymerization with the use of a low-cost 3D printer. Following printing, upon aging and densifying, the resulting objects convert from a gel to a xerogel and then to a fused silica. The printed objects, which are composed of fused silica, are transparent and have tunable density and refractive indices.

show abstract

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Scao¹,

Fan²,

Akiki³

et al. 2022

Preprint

114

View full text Add to dashboard Cite

Using the Output Embedding to Improve Language Models

Press¹,

Wolf²

2016

Preprint

View full text Add to dashboard Cite

Improving Transformer Models by Reordering their Sublayers

Press¹,

Smith²,

Levy³

2020

View full text Add to dashboard Cite

Multilayer transformer networks consist of interleaved self-attention and feedforward sublayers. Could ordering the sublayers in a different pattern lead to better performance? We generate randomly ordered transformers and train them with the language modeling objective. We observe that some of these models are able to achieve better performance than the interleaved baseline, and that those successful variants tend to have more self-attention at the bottom and more feedforward sublayers at the top. We propose a new transformer pattern that adheres to this property, the sandwich transformer, and show that it improves perplexity on multiple word-level and character-level language modeling benchmarks, at no cost in parameters, memory, or training time. However, the sandwich reordering pattern does not guarantee performance gains across every task, as we demonstrate on machine translation models. Instead, we suggest that further exploration of task-specific sublayer reorderings is needed in order to unlock additional gains. 1

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ofir Press

Using the Output Embedding to Improve Language Models

Additive Manufacturing of Transparent Silica Glass from Solutions

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Using the Output Embedding to Improve Language Models

Improving Transformer Models by Reordering their Sublayers

Contact Info

Product

Resources

About