Fast and flexible design of novel proteins using graph neural networks

Strokach, Alexey; Becerra, David; Corbi‐Verge, Carles; Perez-Riba, Albert; Kim, Philip M.

doi:10.1101/868935

Cited by 22 publications

(19 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Previous machine learning models for the task of residue prediction conditioned on chemical environment have only been applied to single-shot residue prediction or architecture class, secondary structure, or ∆∆G prediction [60,89,62,63,61]. Other deep learning approaches have been developed for sequence design or rotamer packing [90,91,92,93,94,95], but most of these methods' designs have not been comprehensively validated by a range of biochemical metrics or by folding in silico or in vitro.…”

Section: Discussionmentioning

confidence: 99%

Protein Sequence Design with a Learned Potential

Anand

Eguchi

Mathews

et al. 2020

Preprint

View full text Add to dashboard Cite

The primary challenge of fixed-backbone protein sequence design is to find a distribution of sequences that fold to the backbone of interest. In practice, state-of-the-art protocols often find viable but highly convergent solutions. In this study, we propose a novel method for fixed-backbone protein sequence design using a learned deep neural network potential. We train a convolutional neural network (CNN) to predict a distribution over amino acids at each residue position conditioned on the local structural environment around the residues. Our method for sequence design involves iteratively sampling from this conditional distribution. We demonstrate that this approach is able to produce feasible, novel designs with quality on par with the state-of-the-art, while achieving greater design diversity. In terms of generalizability, our method produces plausible and variable designs for a de novo TIM-barrel structure, showcasing its practical utility in design applications for which there are no known native structures.

show abstract

Section: Discussionmentioning

confidence: 99%

Protein Sequence Design with a Learned Potential

Anand

Eguchi

Mathews

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“… 197 – 3D CNN gridded atomic coordinates PDB-REDO 19,436 sequence recovery 70%, experimental validation of mutation Shroff et al. 198 ProteinSolver Graph NN partial sequence, adjacency matrix UniParc

residues sequence recovery of 35%, folding and MD test with 4 proteins Strokach et al, 2019 199 gcWGAN CGAN random noise + structure SCOPe 20,125 diversity and TM score of prediction from designed sequence

cVAE Karimi et al. 200 – Graph Transformer backbone structure in graph CATH based 18,025 perplexity: 6.56 (rigid), 11.13 (flexible) (random: 20.00) Ingraham et al.…”

Section: Protein Designmentioning

confidence: 99%

“…They were able to validate designed sequences in silico and demonstrate that some designs folded to their target structures in vitro . 213 …”

Section: Protein Designmentioning

confidence: 99%

Deep Learning in Protein Structural Modeling and Design

et al. 2020

View full text Add to dashboard Cite

Summary Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the “sequence structure function” paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques.

show abstract

“…Networks with architectures borrowed from language models have been trained on amino acid sequences, and been used to generate new sequences without considering protein structure explicitly 4,5 . Other methods have been developed to generate protein backbones without consideration of sequence 6 , and to identify amino acid sequences which either fit well onto specified backbone structures [7][8][9] or are conditioned on low-dimensional fold representation 10 ; models tailored to generate sequences and/or structures for specific protein families have also been developed [11][12][13][14] . However, none of the models described to date address the classical de novo protein design problem: generating new sequences predicted to fold to new structures.…”

Section: Introductionmentioning

confidence: 99%

De novo protein design by deep network hallucination

Anishchenko

Chidyausiku

Овчинников

et al. 2020

Preprint

View full text Add to dashboard Cite

There has been considerable recent progress in protein structure prediction using deep neural networks to infer distance constraints from amino acid residue co-evolution1–3. We investigated whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occuring proteins used in training the models. We generated random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting distance maps, which as expected are quite featureless. We then carried out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (KL-divergence) between the distance distributions predicted by the network and the background distribution. Optimization from different random starting points resulted in a wide range of proteins with diverse sequences and all alpha, all beta sheet, and mixed alpha-beta structures. We obtained synthetic genes encoding 129 of these network hallucinated sequences, expressed and purified the proteins in E coli, and found that 27 folded to monomeric stable structures with circular dichroism spectra consistent with the hallucinated structures. Thus deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute, alongside traditional physically based models, to the de novo design of proteins with new functions.

show abstract

Fast and flexible design of novel proteins using graph neural networks

Cited by 22 publications

References 46 publications

Protein Sequence Design with a Learned Potential

Protein Sequence Design with a Learned Potential

Deep Learning in Protein Structural Modeling and Design

De novo protein design by deep network hallucination

Contact Info

Product

Resources

About