2022
DOI: 10.1101/2022.11.28.518224
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fast protein structure searching using structure graph embeddings

Abstract: Comparing and searching protein structures independent of primary sequence has proved useful for remote homology detection, function annotation and protein classification. With the recent leap in accuracy of protein structure prediction methods and increased availability of protein models, attention is turning to how to best make use of this data. Fast and accurate methods to search databases of millions of structures will be essential to this endeavour, in the same way that fast protein sequence searching und… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(14 citation statements)
references
References 67 publications
0
14
0
Order By: Relevance
“…45 that has been pre-trained using a supervised contrastive learning for embedding protein structures into a low-dimensional latent space (Figure 3a). The pre-trained E-GNN’s latent space clusters the embeddings of similar protein structures together whereas separating dissimilar ones away from one another 45 . We reasoned /that using these E-GNN derived embeddings as features within CatPred can complement the sequence-attention and pLM features.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…45 that has been pre-trained using a supervised contrastive learning for embedding protein structures into a low-dimensional latent space (Figure 3a). The pre-trained E-GNN’s latent space clusters the embeddings of similar protein structures together whereas separating dissimilar ones away from one another 45 . We reasoned /that using these E-GNN derived embeddings as features within CatPred can complement the sequence-attention and pLM features.…”
Section: Resultsmentioning
confidence: 99%
“…The final enzyme and molecular representations are concatenated together and input to a fully connected neural network to output two real values representing the mean and the variance. The E-GNN pre-trained model and its pre-trained weights as described in ref 45 are used without any modification to extract the structural features. For each enzyme 3D-structure, this yielded a 128-dimensional embedding.…”
Section: Methodsmentioning
confidence: 99%
“…Furthermore, as demonstrated in Appendix D, the metric could be used not only as a global indicator, but also in a sample-wise manner to pinpoint underrepresented regions, showing promise in quantifying the sampling limitations of the existing models and guiding future improvements. While we currently use a third-party model [34] to measure the pair-wise distance between structures for the implementation of the above metric, it is worth noting that other distance functions can be similarly utilized, offering flexibility in its implementation. We anticipate that the development of this metric could facilitate the systematic evaluation of generative models in protein design.…”
Section: Discussionmentioning
confidence: 99%
“…g ., TM-align), which involve a series of dynamics programming and heuristic iterative algorithms to refine optimal solutions. Therefore, we resort to a third-party tool [34], which employs supervised contrastive learning to learn a metric for structure comparison. In practice, we read the coordinates of all samples, encode them with the model to get their vectorized representations, and then compute the pairwise cosine similarities, following the designated usage of the model. The choice of k is a hyperparameter that should be determined prior to the computation of the metric.…”
Section: Implementation Detailmentioning
confidence: 99%
See 1 more Smart Citation