Protein arginine methylation (PRme), as one post-translational modification, plays a critical role in numerous cellular processes and regulates critical cellular functions. Though several in silico models for predicting PRme sites have been reported, new models may be required to develop due to the significant increase of identified PRme sites. In this study, we constructed multiple machine-learning and deep-learning models. The deep-learning model CNN combined with the One-Hot coding showed the best performance, dubbed CNNArginineMe. CNNArginineMe performed best in AUC scoring metrics in comparisons with several reported predictors. Additionally, we employed CNNArginineMe to predict arginine methylation proteome and performed functional analysis. The arginine methylated proteome is significantly enriched in the amyotrophic lateral sclerosis (ALS) pathway. CNNArginineMe is freely available at https://github.com/guoyangzou/CNNArginineMe.
The in situ post-translational modification (PTM) crosstalk refers to the interactions between different types of PTMs that occur on the same residue site of a protein. The crosstalk sites generally have different characteristics from those with the single PTM type. Studies targeting the latter’s features have been widely conducted, while studies on the former’s characteristics are rare. For example, the characteristics of serine phosphorylation (pS) and serine ADP-ribosylation (SADPr) have been investigated, whereas those of their in situ crosstalks (pSADPr) are unknown. In this study, we collected 3,250 human pSADPr, 7,520 SADPr, 151,227 pS and 80,096 unmodified serine sites and explored the features of the pSADPr sites. We found that the characteristics of pSADPr sites are more similar to those of SADPr compared to pS or unmodified serine sites. Moreover, the crosstalk sites are likely to be phosphorylated by some kinase families (e.g., AGC, CAMK, STE and TKL) rather than others (e.g., CK1 and CMGC). Additionally, we constructed three classifiers to predict pSADPr sites from the pS dataset, the SADPr dataset and the protein sequences separately. We built and evaluated five deep-learning classifiers in ten-fold cross-validation and independent test datasets. We also used the classifiers as base classifiers to develop a few stacking-based ensemble classifiers to improve performance. The best classifiers had the AUC values of 0.700, 0.914 and 0.954 for recognizing pSADPr sites from the SADPr, pS and unmodified serine sites, respectively. The lowest prediction accuracy was achieved by separating pSADPr and SADPr sites, which is consistent with the observation that pSADPr’s characteristics are more similar to those of SADPr than the rest. Finally, we developed an online tool for extensively predicting human pSADPr sites based on the CNNOH classifier, dubbed EdeepSADPr. It is freely available through http://edeepsadpr.bioinfogo.org/. We expect our investigation will promote a comprehensive understanding of crosstalks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.