2022
DOI: 10.3390/axioms11090469
|View full text |Cite
|
Sign up to set email alerts
|

PTG-PLM: Predicting Post-Translational Glycosylation and Glycation Sites Using Protein Language Models and Deep Learning

Abstract: Post-translational glycosylation and glycation are common types of protein post-translational modifications (PTMs) in which glycan binds to protein enzymatically or nonenzymatically, respectively. They are associated with various diseases such as coronavirus, Alzheimer’s, cancer, and diabetes diseases. Identifying glycosylation and glycation sites is significant to understanding their biological mechanisms. However, utilizing experimental laboratory tools to identify PTM sites is time-consuming and costly. In … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 57 publications
0
4
0
Order By: Relevance
“…RF is an ensemble-based algorithm that is based on decision trees and is commonly used in computational biology for its simplicity and suitability for highdimensional data. XGBoost is also an ensemble learning method that is based on tree boosting and uses gradient descent to deal with high dimensional data [25], [26]. Multiple hyperparameters are experimented with each classifier using training and validation datasets.…”
Section: ) Traditional ML Methodsmentioning
confidence: 99%
“…RF is an ensemble-based algorithm that is based on decision trees and is commonly used in computational biology for its simplicity and suitability for highdimensional data. XGBoost is also an ensemble learning method that is based on tree boosting and uses gradient descent to deal with high dimensional data [25], [26]. Multiple hyperparameters are experimented with each classifier using training and validation datasets.…”
Section: ) Traditional ML Methodsmentioning
confidence: 99%
“…The sequence alignment is widely applied in computational biology (e.g., aligning protein sequences, cf. [15,16]), computational linguistics (e.g., spell correction, speech recognition, machine translation, information extraction, cf. [11]), and information security (e.g., impersonation attacks detection in cloud computing environments, cf.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Similarly, protein language models (PLMs) based on the transformer architecture have found success in the field of proteomics. PLMs are trained on extensive datasets of protein sequences to capture underlying evolutionary patterns and extract semantic information embedded within the protein sequences [ 17 , 18 ]. One of the basic pre-processing steps in NLP is tokenization, the splitting of the protein amino acid sequences into individual units of atomic information called tokens.…”
Section: Introductionmentioning
confidence: 99%