2023
DOI: 10.1101/2023.04.02.534383
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sequence vs. Structure: Delving Deep Into Data-Driven Protein Function Prediction

Abstract: Predicting protein function is a longstanding challenge that has significant scientific implications. The success of amino acid sequence-based learning methods depends on the relationship between sequence, structure, and function. However, recent advances in AlphaFold have led to highly accurate protein structure data becoming more readily available, prompting a fundamental question: given sufficient experimental and predicted structures, should we use structure-based learning methods instead of sequence-based… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 45 publications
0
3
0
Order By: Relevance
“…Even for transmembrane proteins, which are typically underrepresented among proteins with experimental 3D structures, the corresponding structure predictions are highly accurate 43 . Unfortunately, current techniques for extracting protein representations from predicted structures do not surpass the performance of protein representations extracted directly from the proteins’ amino acid sequence for functional predictions 23 . However, once methods emerge that effectively extract the relevant information from protein 3D structures, their utilization will likely improve model performance.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Even for transmembrane proteins, which are typically underrepresented among proteins with experimental 3D structures, the corresponding structure predictions are highly accurate 43 . Unfortunately, current techniques for extracting protein representations from predicted structures do not surpass the performance of protein representations extracted directly from the proteins’ amino acid sequence for functional predictions 23 . However, once methods emerge that effectively extract the relevant information from protein 3D structures, their utilization will likely improve model performance.…”
Section: Discussionmentioning
confidence: 99%
“…We aimed to predict whether a given molecule is a substrate for a particular transport protein from the molecular structure of the molecule and the linear amino acid sequence of the protein, without relying on the protein's 3D structure. Although 3D structures can be predicted for most proteins [20][21][22] , current deep learning tools can extract functional information much more easily from the amino acid sequence than from the 3D structure 23 . We generated highly informative transporter and substrate representations by using two Transformer Networks trained to process protein amino acid sequences 17 and string representations of small molecules 18 , respectively.…”
Section: Introductionmentioning
confidence: 99%
“…It is generally agreed that for many proteins, sequence determines structure, and structure strongly influences function. Thus, there have been efforts to enrich protein representations by incorporating structural information using voxels, contact maps, or graph neural networks. However, these have not led to significant performance improvements, likely because variant structures vary in subtle yet impactful ways which are challenging to model and extremely difficult to observe experimentally, despite an explosion in protein structure prediction tools. Many available protein structures may be quite noisy or inaccurate.…”
Section: Navigating Protein Fitness Landscapes Using Machine Learningmentioning
confidence: 99%