2021
DOI: 10.1101/2021.07.31.454572
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Protein language model embeddings for fast, accurate, alignment-free protein structure prediction

Abstract: All state-of-the-art (SOTA) protein structure predictions rely on evolutionary information captured in multiple sequence alignments (MSAs), primarily on evolutionary couplings (co-evolution). Such information is not available for all proteins and is computationally expensive to generate. Prediction models based on Artificial Intelligence (AI) using only single sequences as input are easier and cheaper but perform so poorly that speed becomes irrelevant. Here, we described the first competitive AI solution excl… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
21
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

5
3

Authors

Journals

citations
Cited by 25 publications
(22 citation statements)
references
References 58 publications
(52 reference statements)
1
21
0
Order By: Relevance
“…To repeat the previous speculation: embeddings might capture a reality that constrains what can be observed in evolution, and this reality is exactly what is used for the part of the SAV effect prediction that succeeds. If so, we would argue that our simpli ed method did not succeed because it predicted conservation without using MSAs, but that it captured positions biophysically "marked by constraints", i.e., residues with higher contact density in protein 3D structures (Weißenow et al 2021). This assumption would explain how predicted conservation (ProtT5cons) not using evolutionary information could predict SAV effects better than a slightly more correct approach (ConSeq) using MSAs to extract evolutionary information (Fig.…”
Section: Discussionmentioning
confidence: 98%
See 1 more Smart Citation
“…To repeat the previous speculation: embeddings might capture a reality that constrains what can be observed in evolution, and this reality is exactly what is used for the part of the SAV effect prediction that succeeds. If so, we would argue that our simpli ed method did not succeed because it predicted conservation without using MSAs, but that it captured positions biophysically "marked by constraints", i.e., residues with higher contact density in protein 3D structures (Weißenow et al 2021). This assumption would explain how predicted conservation (ProtT5cons) not using evolutionary information could predict SAV effects better than a slightly more correct approach (ConSeq) using MSAs to extract evolutionary information (Fig.…”
Section: Discussionmentioning
confidence: 98%
“…In fact, one pLM used here, namely ProtT5, has recently been shown to explicitly capture aspects of longrange inter-residue distances directly during pre-training. I.e., without ever being trained on any labeled data pLMs pick up structural constraints that allow protein 3D structure prediction from single protein sequences (Weißenow et al 2021). Another explanation for how ProtT5 embeddings capture conservation might be that pLMs picked up signals from short, frequently re-occurring sequence/structure motifs such as localization signals or catalytic sites that are more conserved than other parts of the sequence.…”
Section: Discussionmentioning
confidence: 99%
“…In fact, the pLMs used here, namely ProtT5, has recently been shown to explicitly capture aspects of longrange inter-residue distances directly during pre-training, i.e., without ever being trained on any labeled data, pLMs pick up structural constraints that allow protein 3D structure prediction from single protein sequences (Weißenow et al 2021). Another explanation for how ProtT5 embeddings capture conservation might be that pLMs picked up signals from short, frequently re-occurring sequence/structure motifs such as localization signals or catalytic sites that are more conserved than other parts of the sequence.…”
Section: Discussionmentioning
confidence: 99%
“…To repeat the previous speculation: embeddings might capture a reality that constraints what can be observed in evolution, and this reality is exactly what is used for the part of the SAV effect prediction that succeeds. If so, we would argue that our simpli ed method did not succeed because it predicted conservation without using MSAs, but that it captured positions biophysically "marked by constraints", i.e., residues with higher contact density in protein 3D structures (Weißenow et al 2021). This assumption would explain how predicted conservation (ProtT5cons) not using evolutionary information could predict SAV effects better than a slightly more correct approach (ConSeq) using MSAs to extract evolutionary information (Fig.…”
mentioning
confidence: 99%
“…AlphaFold 2 heavily relies on information from multiple sequence alignments (MSAs). Recent structure predictions without MSAs remain less accurate 16 . Either way, it remains unclear to which extent structure predictions could improve the prediction of binding residues beyond the unique opportunity to step up from binding residues to binding sites.…”
Section: Introductionmentioning
confidence: 99%