2022
DOI: 10.1101/2022.07.13.499967
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ProteinSGM: Score-based generative modeling forde novoprotein design

Abstract: Score-based generative models are a novel class of generative models that have shown state-of-the-art sample quality in image synthesis, surpassing the performance of GANs in multiple tasks. Here we present ProteinSGM, a score-based generative model that produces realistic de novo proteins and can inpaint plausible backbones and functional sites into structures of predefined length. With unconditional generation, we show that score-based generative models can generate native-like protein structures, surpassing… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 38 publications
0
6
0
Order By: Relevance
“…For all in silico benchmarks in this paper, we use the AF2 structure prediction network 21 for validation and define an in silico “success” as an RF diffusion output for which the AF2 structure predicted from a single sequence is (1) of high confidence (mean predicted aligned error, pAE, < 5), (2) globally within 2Å backbone-RMSD of the designed structure, and (3) within 1Å backbone-RMSD on any scaffolded functional-site. This definition of success is significantly more stringent than those described elsewhere (refs [ 5,8,16,25 ], Fig. S3A-B) but is a good predictor of experimental success 4,7,26 .…”
Section: Mainmentioning
confidence: 91%
See 2 more Smart Citations
“…For all in silico benchmarks in this paper, we use the AF2 structure prediction network 21 for validation and define an in silico “success” as an RF diffusion output for which the AF2 structure predicted from a single sequence is (1) of high confidence (mean predicted aligned error, pAE, < 5), (2) globally within 2Å backbone-RMSD of the designed structure, and (3) within 1Å backbone-RMSD on any scaffolded functional-site. This definition of success is significantly more stringent than those described elsewhere (refs [ 5,8,16,25 ], Fig. S3A-B) but is a good predictor of experimental success 4,7,26 .…”
Section: Mainmentioning
confidence: 91%
“…In RF diffusion , we input motifs as 3D coordinates (including sequence and sidechains) both during conditional training and inference, and RF diffusion builds scaffolds that hold the motif atomic coordinates in place. A number of deep learning methods have been developed recently to address this problem, including RF joint Inpainting 4 , constrained Hallucination 4 , and other DDPMs 5,8,25 . To rigorously evaluate the performance of these methods in comparison to RF diffusion across a broad set of design challenges, we established an in silico benchmark test comprising 25 motif-scaffolding design problems addressed in six recent publications encompassing several design methodologies 4,5,25,3638 .…”
Section: Mainmentioning
confidence: 99%
See 1 more Smart Citation
“…GENESIS [86] implemented a VAE that takes secondary structure sketches and outputs contact maps with finer definitions of secondary structural elements. ProteinSGM [87] uses stochastic differential equations (SDE) to generate matrices that capture distance and torsional angles, which are then passed to Rosetta to produce 3D folded structures. FoldingDiff [88] merges this two-step approach by using a set of six internal angles, directly producing good quality structures without needing other methods like Rosetta for refinement.…”
Section: The Deep Learning Era Of Protein Sequence and Structure Gene...mentioning
confidence: 99%
“…The field of computational protein design has achieved major breakthroughs in recent years [1] in terms of its ability to design proteins and protein assemblies with diverse folds and functions (see, e.g., [2][3][4][5][6][7][8][9]), which has already found application in the design of therapeutically relevant biomolecules such as vaccines [10] and antibodies [11,12]. Such breakthroughs are built upon improvements in atomistic modeling techniques, such as the Rosetta software suite [13], and recent advances in machine learning-based structure prediction models [14][15][16][17][18], sequence design (or inverse folding) models [19][20][21][22][23][24][25], protein language models [26][27][28][29][30][31][32][33][34], and denoising diffusion probabilistic models [35][36][37][38][39][40][41][42][43][44].…”
mentioning
confidence: 99%