2023
DOI: 10.1101/2023.03.09.531927
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Clustering predicted structures at the scale of the known protein universe

Abstract: Proteins are key to all cellular processes and their structure is important in understanding their function and evolution. Sequence-based predictions of protein structures have increased in accuracy with over 214 million predicted structures available in the AlphaFold database (AFDB). However, studying protein structures at this scale requires highly efficient methods. Here, we developed a structural-alignment based clustering algorithm - Foldseek cluster - that can cluster hundreds of millions of structures. … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
49
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 34 publications
(49 citation statements)
references
References 43 publications
0
49
0
Order By: Relevance
“…The availability of high quality predicted structures allows us to expand functional curation efforts to include structure-based similarities. One useful approach for this is structure-based alignment, which is described in (Barrio-Hernandez et al, 2023), and also incorporated into our examples to find similarities to proteins with known PDB structures using Foldseek. Here, we look into another angle of structure comparison, which is based on the concept of a “structural outlier”.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The availability of high quality predicted structures allows us to expand functional curation efforts to include structure-based similarities. One useful approach for this is structure-based alignment, which is described in (Barrio-Hernandez et al, 2023), and also incorporated into our examples to find similarities to proteins with known PDB structures using Foldseek. Here, we look into another angle of structure comparison, which is based on the concept of a “structural outlier”.…”
Section: Resultsmentioning
confidence: 99%
“…Recently, several complementary approaches have been developed to categorise the diversity of the protein universe and uncover novelties (Akdel et al, 2022; Barrio-Hernandez et al, 2023; Bordin et al, 2023), again highlighting the importance of incorporating multiple perspectives and methods in protein function annotation. These approaches showcase the significance of using a diverse set of information to gain a more complete understanding of protein function and its role in cellular processes.…”
Section: Conclusion: Towards Large-scale Protein Function Annotationmentioning
confidence: 99%
“…Clustering similar structures is one way to annotate for function. 63 Alternatively, many enzymes have common "modules," or recurring residue arrangements, which perform similar reactions. 64 The structures of active sites in unlabeled protein structures could be compared to existing structures to identify new, diverse sets of proteins with given function, using models trained on sequence and structure.…”
Section: Annotation Of Enzyme Activity Among Known Proteinsmentioning
confidence: 99%
“…Dedicated data mining demands a clearly stated working hypothesis. While several groups have pursued intensive model classifications against AlphaFold DB (Barrio‐Hernandez et al, 2023; Bordin et al, 2023; Durairaj et al, 2023), this bird's‐eye approach could miss unique and intriguing proteins. To find these hidden gems, we defined a very specific database search question: are there monomeric proteins that contain multiple phosphate‐binding loops (P‐loops) on a single continuous β‐sheet?…”
Section: Introductionmentioning
confidence: 99%