2022
DOI: 10.1101/2022.12.21.521521
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Language models generalize beyond natural proteins

Abstract: Learning the design patterns of proteins from sequences across evolution may have promise toward generative protein design. However it is unknown whether language models, trained on sequences of natural proteins, will be capable of more than memorization of existing protein families. Here we show that language models generalize beyond natural proteins to generate de novo proteins. We focus on two protein design tasks: fixed backbone design where the structure is specified, and unconstrained generation where th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
74
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 93 publications
(107 citation statements)
references
References 73 publications
2
74
0
Order By: Relevance
“…Concurrently, Verkuil et al (2022) demonstrate that pLMs at scale can generalize beyond natural proteins to generate de novo proteins, and validate their hypothesis in silico and experimentally in great detail, in which pLMs are capable of even designing protein structure even though they only trained on sequences.…”
Section: G Related Workmentioning
confidence: 54%
See 2 more Smart Citations
“…Concurrently, Verkuil et al (2022) demonstrate that pLMs at scale can generalize beyond natural proteins to generate de novo proteins, and validate their hypothesis in silico and experimentally in great detail, in which pLMs are capable of even designing protein structure even though they only trained on sequences.…”
Section: G Related Workmentioning
confidence: 54%
“…Recently, Lin et al (2022) show that sequential evolutionary knowledge learned by p LMs corresponds deeply to protein structures and thus materializes protein structure prediction from single sequences. In a concurrent work, Verkuil et al (2022) further demonstrate that p LMs at scale can synthesize de novo proteins on the basis of the deep grammars learned from large-scale native sequences, generalizing beyond natural proteins, at a high experimental success rate. Likewise, in the recent advances in generative AI algorithms in general, e .…”
Section: Discussionmentioning
confidence: 86%
See 1 more Smart Citation
“…The relaxed sequence hallucination method provides substantial efficiency advantages relative to the more commonly used MCMC methods. For example, recently described de-novo designed luciferase enzymes were produced by MCMC hallucination with 30,000 iterations 4 or large language model based designs requiring up to 170.000 iterations 9 . By contrast, our gradient-descent approach typically converged within < 100 iterations.…”
Section: Discussionmentioning
confidence: 99%
“…A DL based design method called 'deep network hallucination' 8 leverages this connection for a variety of protein design problems by iteratively updating a protein sequence until a desired property encoded in a mathematical loss function is obtained. Typically, when performing deep network hallucination, random mutations are applied in a Monte Carlo Markov Chain (MCMC) fashion 4,6,[9][10][11] . However, this method can be computationally inefficient due to the need for a large number of iterations.…”
Section: Introductionmentioning
confidence: 99%