2022
DOI: 10.1109/tcbb.2021.3108718
|View full text |Cite
|
Sign up to set email alerts
|

TripletProt: Deep Representation Learning of Proteins Based On Siamese Networks

Abstract: Pretrained representations have recently gained attention in various machine learning applications. Nonetheless, the high computational costs associated with training these models have motivated alternative approaches for representation learning. Herein we introduce TripletProt, a new approach for protein representation learning based on the Siamese neural networks. Representation learning of biological entities which capture essential features can alleviate many of the challenges associated with supervised le… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 40 publications
0
5
0
Order By: Relevance
“…Here we covered the core challenges and considerations in supervising the models in fitness prediction, yet additional downstream analysis and posing insightful questions will give us more understanding and directions in discriminating the protein sequences based on their fitness. In order to improve the pretraining step, we might adopt techniques such as adjusting the masking rate [64], adding biological priors [60,65], increasing the model parameters [57], and building specialized language models for the desired fitness [66], given the growing data availability and computational resources. Additional studies are required for improved downstream fitness predictions, such as fine-tuning with a reduced chance of overfitting [67], incorporating the effect of post-translational modifications, and characterizing the performance of embeddings in different data setups [68] with varying protein types and finesses for supporting the development of novel proteins in diagnostics and therapeutics.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Here we covered the core challenges and considerations in supervising the models in fitness prediction, yet additional downstream analysis and posing insightful questions will give us more understanding and directions in discriminating the protein sequences based on their fitness. In order to improve the pretraining step, we might adopt techniques such as adjusting the masking rate [64], adding biological priors [60,65], increasing the model parameters [57], and building specialized language models for the desired fitness [66], given the growing data availability and computational resources. Additional studies are required for improved downstream fitness predictions, such as fine-tuning with a reduced chance of overfitting [67], incorporating the effect of post-translational modifications, and characterizing the performance of embeddings in different data setups [68] with varying protein types and finesses for supporting the development of novel proteins in diagnostics and therapeutics.…”
Section: Discussionmentioning
confidence: 99%
“…Here we covered the core challenges and considerations in supervising the models in fitness prediction, yet additional downstream analysis and posing insightful questions will give us more understanding and directions in discriminating the protein sequences based on their fitness. In order to improve the pretraining step, we might adopt techniques such as adjusting the masking rate [64], adding biological priors [60,65], increasing the model parameters [57], and building specialized language models for the desired fitness [66] ,…”
Section: Discussionmentioning
confidence: 99%
“…Here, we covered the core challenges and considerations in supervising the models in fitness prediction, yet additional downstream analysis and posing insightful questions will give us more understanding and directions in discriminating the protein sequences based on their fitness. In order to improve the pretraining step, we might adopt techniques such as adjusting the masking rate [72], adding biological priors [69,73], increasing the model parameters [66], and building specialized language models for the desired fitness [74], given the growing data availability and computational resources. Additional studies are required for improved downstream fitness predictions, such as fine-tuning with a reduced chance of overfitting [75], incorporating the effect of post-translational modifications, and characterizing the performance of embeddings in different data setups [76] with varying protein types and finesses for supporting the development of novel proteins in diagnostics and therapeutics.…”
Section: Discussionmentioning
confidence: 99%
“…There is growing interest in developing protein language models ( p LMs) at the scale of evolution due to the abundance of 1D amino acid sequences, such as the series of ESM (Rives et al, 2019; Lin et al, 2022), TAPE (Rao et al, 2019), ProtTrans (Elnaggar et al, 2021), PRoBERTa (Nambiar et al, 2020), PMLM (He et al, 2021), ProteinLM (Xiao et al, 2021), PLUS (Min et al, 2021), Adversarial MLM (McDermott et al, 2021), ProteinBERT (Brandes et al, 2022), CARP (Yang et al, 2022a) in masked language modeling (MLM) fashion, ProtGPT2 (Ferruz et al, 2022) in causal language modeling fashion, and several others (Melnyk et al, 2022a; Madani et al, 2021; Unsal et al, 2022; Nourani et al, 2021; Lu et al, 2020; Sturmfels et al, 2020; Strodthoff et al, 2020). These protein language models are able to generalize across a wide range of downstream applications and can capture evolutionary information about secondary and tertiary structures from sequences alone.…”
Section: Related Workmentioning
confidence: 99%