2022
DOI: 10.1101/2022.06.23.497276
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SETH predicts nuances of residue disorder from protein embeddings

Abstract: Predictions of millions of protein 3D structures are only a few clicks away since the release of AlphaFold2 results for entire data sets. However, many proteins have so-called intrinsically disordered regions (IDRs) that do not adopt unique structures in isolation. These IDRs are associated with several diseases, including Alzheimer's Disease. We showed that the absence of reliable AlphaFold2 predictions correlated only to a limited extent with IDRs. In contrast, many expert methods predict IDRs directly and r… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
14
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 13 publications
(15 citation statements)
references
References 120 publications
(373 reference statements)
1
14
0
Order By: Relevance
“…We note that these results have been reproduced by Ilzhoefer et al(71) after the original release of our work in a bioRxiv manuscript from May 26, 2022…”
Section: Transformersupporting
confidence: 78%
“…We note that these results have been reproduced by Ilzhoefer et al(71) after the original release of our work in a bioRxiv manuscript from May 26, 2022…”
Section: Transformersupporting
confidence: 78%
“… Panel A: residue level features: secondary structure, transmembrane topology, disordered residues, small molecule, nucleic or metal binding residues, residue conservation and average variation (23; 39; 42; 13; 29); Panel B: sequence-level features: predicted subcellular localization (64), and an excerpt of predicted GO-annotations (38); Panel C: effect of SAVs (wild-type sequence on x-axis, mutations on y-axis; darker color=higher effect) (42); and Panel D : predicted 3D structure (45). Interactive version at https://embed.predictprotein.org/#/Q9NZC2.…”
Section: Resultsmentioning
confidence: 99%
“…SETH (29), a two-layer CNN, predicts the degree of intrinsic disorder of a residue as defined by the chemical shift Z-scores (CheZOD) (48), where values below 8 signify disorder and values above 8 signify order. Different pLMs were compared (ProtT5 (23), ProSE(10), ESM-1b (54), ProtBERT (23), SeqVec (24)) with ProtT5 numerically outperforming the others.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In particular, we unconditionally (i.e., without priors on family, function or structure) generated a set of 100,000 protein sequences using ProtGPT2, and predicted secondary structure [22], Gene Ontology (GO) terms [44], residue ability to bind small molecules, nucleotides or metals [47], protein subcellular localization [46], transmembrane topology [41], residue conservation [42], residue disorder [43] and CATH family [45]. Remarkably, this generated a repertoire of 100,000 protein sequences with ∼12 predicted features of structure and function from a single script in approximately 3.5 hours ( Supplement 1 ).…”
Section: Introductionmentioning
confidence: 99%