Krzysztof Maziarz scite author profile

The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this work, we address these challenges and finally realize the promise of conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters. We introduce a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks. A trainable gating network determines a sparse combination of these experts to use for each example. We apply the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. We present model architectures in which a MoE with up to 137 billion parameters is applied convolutionally between stacked LSTM layers. On large language modeling and machine translation benchmarks, these models achieve significantly better results than state-of-the-art at lower computational cost. * Equally major contributors † Work done as a member of the Google Brain Residency program (g.co/brainresidency)

show abstract

Learning to Extend Molecular Scaffolds with Structural Motifs

Maziarz¹,

Jackson-Flux²,

Cameron³

et al. 2021

Preprint

View full text Add to dashboard Cite

Recent advancements in deep learning-based modeling of molecules promise to accelerate in silico drug discovery. There is a plethora of generative models available, which build molecules either atom-by-atom and bond-by-bond or fragment-byfragment. Many drug discovery projects also require a fixed scaffold to be present in the generated molecule, and incorporating that constraint has been recently explored. In this work, we propose a new graph-based model that learns to extend a given partial molecule by flexibly choosing between adding individual atoms and entire fragments. Extending a scaffold is implemented by using it as the initial partial graph, which is possible because our model does not depend on generation history. We show that training using a randomized generation order is necessary for good performance when extending scaffolds, and that the results are further improved by increasing fragment vocabulary size. Our model pushes the state-of-the-art of graph-based molecule generation, while being an order of magnitude faster to train and sample from than existing approaches.

show abstract

Retrosynthetic Planning with Dual Value Networks

Liu¹,

Xue²,

Xie³

et al. 2023

Preprint

View full text Add to dashboard Cite

Evolutionary-Neural Hybrid Agents for Architecture Search

Maziarz¹,

Tan²,

Khorlin³

et al. 2018

Preprint

View full text Add to dashboard Cite

Neural Architecture Search has recently shown potential to automate the design of Neural Networks. Deep Reinforcement Learning agents can learn complex architectural patterns, as well as explore a vast and compositional search space. On the other hand, evolutionary algorithms offer the sample efficiency needed for such a resource intensive application. We propose a class of Evolutionary-Neural hybrid agents (Evo-NAS), that retain the qualities of the two approaches. We show that the Evo-NAS agent outperforms both Neural and Evolutionary agents when applied to architecture search for a suite of text and image classification benchmarks. On a high-complexity architecture search space for image classification, the Evo-NAS agent surpasses the accuracy achieved by commonly used agents with only 1/3 of the search cost.

show abstract

Holistic Multi-View Building Analysis in the Wild with Projection Pooling

Wojna¹,

Maziarz²,

Jocz³

et al. 2020

Preprint

View full text Add to dashboard Cite

We address six different classification tasks related to finegrained building attributes: construction type, number of floors, pitch, and geometry of the roof, facade material, and occupancy class. Tackling such a problem of remote building analysis became possible only recently due to growing large-scale datasets of urban scenes. To this end, we introduce a new benchmarking dataset, consisting of 49426 top-view and street-view images of 9674 buildings. These photos are further assembled, together with the geometric metadata. The dataset showcases a variety of real-world challenges, such as occlusions, blur, partially visible objects, and a broad spectrum of buildings. We propose a new projection pooling layer, creating a unified, top-view representation of the top-view and the side views in a high-dimensional space. It allows us to utilize the building and imagery metadata seamlessly. Introducing this layer improves classification accuracy -compared to highly tuned baseline modelsindicating its suitability for building analysis.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.