2022
DOI: 10.1101/2022.12.01.518682
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Illuminating protein space with a programmable generative model

Abstract: Three billion years of evolution have produced a tremendous diversity of protein molecules, and yet the full potential of this molecular class is likely far greater. Accessing this potential has been challenging for computation and experiments because the space of possible protein molecules is much larger than the space of those likely to host function. Here we introduce Chroma, a generative model for proteins and protein complexes that can directly sample novel protein structures and sequences and that can be… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
110
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 85 publications
(110 citation statements)
references
References 75 publications
0
110
0
Order By: Relevance
“…More recently, deep-learning-based methods have increased the complexity of designable structures (17, 20, 22) and machine-learning-based generative models have shown increasingly sophisticated design capabilities. These include sequence-based Potts models and autoregressive language models for designing sequences (2527), Markov Chain Monte Carlo algorithms combined with structure prediction for jointly designing sequences and structures (17, 20, 22), inverse folding models that use structural backbone coordinates to design sequences (21, 23), and concurrent work using diffusion models for designing protein backbones (28, 29). A key contribution of this study is to combine the modularity aspired to by classical methods with the power of modern generative models, in particular improvements in the accuracy and efficiency of language-model-based protein structure prediction (16).…”
Section: Related Workmentioning
confidence: 99%
“…More recently, deep-learning-based methods have increased the complexity of designable structures (17, 20, 22) and machine-learning-based generative models have shown increasingly sophisticated design capabilities. These include sequence-based Potts models and autoregressive language models for designing sequences (2527), Markov Chain Monte Carlo algorithms combined with structure prediction for jointly designing sequences and structures (17, 20, 22), inverse folding models that use structural backbone coordinates to design sequences (21, 23), and concurrent work using diffusion models for designing protein backbones (28, 29). A key contribution of this study is to combine the modularity aspired to by classical methods with the power of modern generative models, in particular improvements in the accuracy and efficiency of language-model-based protein structure prediction (16).…”
Section: Related Workmentioning
confidence: 99%
“…Similar to (Dauparas et al, 2022), several other state-of-the-art protein design algorithms involve a sequence decoder trained to maximize p (seq| structure) (Watson et al, 2022) (Ingraham et al, 2022).…”
Section: Theorymentioning
confidence: 99%
“…Moreover, the higher or more specialized functionality desired, the rarer such sequences become [46]. The rational design of protein sequences with programmed function requires models of the sequence-function (i.e., genotype-phenotype) relationship and a means to guide sampling from this distribution to generate plausible candidate sequences with the desired functionality for experimental synthesis and testing [4, 7].…”
Section: Introductionmentioning
confidence: 99%
“…Historically, the sequence-structure mapping has been adopted as a proxy for the sequence-function relationship, with the functional design task reduced to the engineering of a particular three-dimensional fold (e.g., optimization of an enzymatic active site, engineering of a binding cleft). In recent years, deep learning networks exploiting modern tools such as equivariance-inducing architectures and diffusion models have broken new paths in computational protein structure prediction with atomic-level accuracy [11, 12] and, very recently, programmability of desired three-dimensional structures [7]. These technological advances have also powered direct learning of the sequence-function relationship using approaches such as recurrent neural networks [13, 14], variational autoencoders [1521], generative adversarial networks [22], reinforcement learning [23], and transformers [2431].…”
Section: Introductionmentioning
confidence: 99%