Zhao Zhang scite author profile

AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (i) tackle new tasks, like protein-ligand complex structure prediction, (ii) investigate the process by which the model learns, which remains poorly understood, and (iii) assess the model's generalization capacity to unseen regions of fold space. Here we report OpenFold, a fast, memory-efficient, and trainable implementation of AlphaFold2, and OpenProteinSet, the largest public database of protein multiple sequence alignments. We use OpenProteinSet to train OpenFold from scratch, fully matching the accuracy of AlphaFold2. Having established parity, we assess OpenFold's capacity to generalize across fold space by retraining it using carefully designed datasets. We find that OpenFold is remarkably robust at generalizing despite extreme reductions in training set size and diversity, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced by OpenFold during training, we also gain surprising insights into the manner in which the model learns to fold proteins, discovering that spatial dimensions are learned sequentially. Taken together, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial new resource for the protein modeling community.

show abstract

Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs

You

Zhang²,

Hsieh

et al. 2019

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Kira: Processing Astronomy Imagery Using Big Data Technology

Zhang

Barbary

Nothaft

et al. 2020

IEEE Trans. Big Data

View full text Add to dashboard Cite

Application Skeleton: Generating Synthetic Applications for Infrastructure Research

Zhang¹,

Merzky²,

Turilli³

et al. 2016

JOSS

View full text Add to dashboard Cite

SummaryApplication Skeleton is a simple and powerful tool to build simplified synthetic science and engineering applications (for example, modeling and simulation, data analysis) with runtime and I/O close to that of the real applications. It is intended for applied computer scientists who need to use science and engineering applications to verify the effectiveness of new systems designed to efficiently run such applications, so that they can bypass obstacles that they often encounter when accessing and building real science and engineering applications. Using the applications generated by Application Skeleton guarantees that the CS systems' effectiveness on synthetic applications will apply to the real applications.Application Skeleton can generate bag-of-task, (iterative) map-reduce, and (iterative) multistage workflow applications. These applications are represented as a set of tasks, a set of input files, and a set of dependencies. These applications can be generally considered many-task applications, and once created, can be run on single-core, single-node, multi-core, or multi-node (distributed or parallel) computers, depending on what workflow system is used to run them. The generated applications are compatible with workflow system such as Swift (Zhao et al. 2007, Wilde et al. (2009), Wilde et al. (2011) and Pegasus (Ewa Deelman et al. 2004, E. Deelman et al. (2005), as well as the ubiquitous UNIX shell. The application can also be created as a generic JSON object that can be used by other systems such as the AIMES (Turilli et al. 2015) middleware.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhao Zhang

OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization

Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs

Kira: Processing Astronomy Imagery Using Big Data Technology

Application Skeleton: Generating Synthetic Applications for Infrastructure Research

Contact Info

Product

Resources

About