Junzhe Cao scite author profile

Pupylation plays a key role in regulating various protein functions as a crucial posttranslational modification of prokaryotes. In order to understand the molecular mechanism of pupylation, it is important to identify pupylation substrates and sites accurately. Several computational methods have been developed to identify pupylation sites because the traditional experimental methods are time-consuming and labor-sensitive. With the existing computational methods, the experimentally annotated pupylation sites are used as the positive training set and the remaining nonannotated lysine residues as the negative training set to build classifiers to predict new pupylation sites from the unknown proteins. However, the remaining nonannotated lysine residues may contain pupylation sites which have not been experimentally validated yet. Unlike previous methods, in this study, the experimentally annotated pupylation sites were used as the positive training set whereas the remaining nonannotated lysine residues were used as the unlabeled training set. A novel method named PUL-PUP was proposed to predict pupylation sites by using positive-unlabeled learning technique. Our experimental results indicated that PUL-PUP outperforms the other methods significantly for the prediction of pupylation sites. As an application, PUL-PUP was also used to predict the most likely pupylation sites in nonannotated lysine sites.

show abstract

iLM-2L: A two-level predictor for identifying protein lysine methylation sites and their methylation degrees by incorporating K-gap amino acid pairs into Chou׳s general PseAAC

Cao

2015

Journal of Theoretical Biology

View full text Add to dashboard Cite

Prediction of protein N-formylation using the composition of k-spaced amino acid pairs

Cao

2017

Analytical Biochemistry

View full text Add to dashboard Cite

Predicting Viral Protein Subcellular Localization with Chou's Pseudo Amino Acid Composition and Imbalance-Weighted Multi-Label K-Nearest Neighbor Algorithm

Cao

Liu²,

2012

PPL

View full text Add to dashboard Cite

Machine learning is a kind of reliable technology for automated subcellular localization of viral proteins within a host cell or virus-infected cell. One challenge is that the viral protein samples are not only with multiple location sites, but also class-imbalanced. The imbalanced dataset often decreases the prediction performance. In order to accomplish this challenge, this paper proposes a novel approach named imbalance-weighted multi-label K-nearest neighbor to predict viral protein subcellular location with multiple sites. The experimental results by jackknife test indicate that the presented algorithm achieves a better performance than the existing methods and has great potentials in protein science.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Junzhe Cao

Predicting lysine phosphoglycerylation with fuzzy SVM by incorporating k-spaced amino acid pairs into Chou׳s general PseAAC

Positive-Unlabeled Learning for Pupylation Sites Prediction

iLM-2L: A two-level predictor for identifying protein lysine methylation sites and their methylation degrees by incorporating K-gap amino acid pairs into Chou׳s general PseAAC

Prediction of protein N-formylation using the composition of k-spaced amino acid pairs

Predicting Viral Protein Subcellular Localization with Chou's Pseudo Amino Acid Composition and Imbalance-Weighted Multi-Label K-Nearest Neighbor Algorithm

Contact Info

Product

Resources

About