2020
DOI: 10.1093/bioinformatics/btaa701
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function

Abstract: Motivation Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available. Results We applied an existing deep sequ… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
78
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 80 publications
(83 citation statements)
references
References 39 publications
(57 reference statements)
5
78
0
Order By: Relevance
“…We already showed that SeqVec embeddings achieve competitive performance when applied in the task of molecular function prediction [22]. We built upon this previous work to characterise SeqVec-based molecular function prediction.…”
Section: Results Characterisation Evaluating Seqvec-based Molecular Fmentioning
confidence: 99%
See 2 more Smart Citations
“…We already showed that SeqVec embeddings achieve competitive performance when applied in the task of molecular function prediction [22]. We built upon this previous work to characterise SeqVec-based molecular function prediction.…”
Section: Results Characterisation Evaluating Seqvec-based Molecular Fmentioning
confidence: 99%
“…Whereas the new generation techniques usually outperform the established BLAST baseline method, they also require vast amounts of protein data which is not always comprehensive. Therefore, the most recent approaches often turn to automatic representation learning by which a complex model (often a neural network) learns some abstract features of a protein sequence which contain useful information for a consequent computational function prediction task [20][21][22]. and (ii) how the effects of these side chains vary across environmental context (i.e.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, it was better than ProtVec [31] which is a context-independent model. In some tasks, such as protein function prediction, it outperformed one-hot encoding of k-mer-based embeddings and showed the competitive results obtained using ELMo [75].…”
Section: Survey Of Representation Learning Applications In Sequence Amentioning
confidence: 99%
“…It can be further enhanced by unsupervised pre-training, where a generic protein representation is learned [83][84][85][86] from all available sequences. This representation can then be used to predict GO terms [87]. But end-to-end training is also possible, where the weights of the unsupervised feature extractor are also fine-tuned to create an ontology-specific feature representation designed for predicting GO terms from that ontology.…”
Section: Protein Representationmentioning
confidence: 99%