Integration of A Deep Learning Classifier with A Random Forest Approach for Predicting Malonylation Sites

Chen, Zhen; He, Ningning; Huang, Yu; Qin, Wen; Liu, Xuhan; Li, Lei

doi:10.1016/j.gpb.2018.08.004

Cited by 71 publications

(56 citation statements)

References 42 publications

(58 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The EAAC encoding [26,[32][33][34] introduces a fixed-length sliding window based on the encoding of Amino Acid Composition (AAC), which calculates the frequency of each type of amino acids in a protein or peptide sequence [35]. EAAC is calculated by continuously sliding a fixed-length sequence window (using the default value as 5) from the N-terminus to the C-terminus of each peptide.…”

Section: Enhanced Amino Acid Composition (Eaac)mentioning

confidence: 99%

DeepCSO: a deep-learning network approach to predicting Cysteine S-sulphenylation sites

Lyu

Chen

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Cysteine S-sulphenylation (CSO), as a novel post-translational modification (PTM), has emerged as a potential mechanism to regulate protein functions and affect signal networks. Because of its functional significance, several prediction approaches have been developed. Nevertheless, they are based on a limited dataset from Homo sapiens and there is a lack of prediction tools for the CSO sites of other species. Recently, this modification has been investigated at the proteomics scale for a few species and the number of identified CSO sites has significantly increased. Thus, it is essential to explore the characteristics of this modification across different species and construct prediction models with better performances based on the enlarged dataset. In this study, we constructed a few classifiers and fond that the long short-term memory model with the word-embedding encoding approach, dubbed LSTMWE, performs favorably to the traditional machine-learning models and other deep-learning models across different species, in terms of cross-validation and independent test. The area under the ROC curve values for LSTMWE ranged from 0.82 to 0.85 for different organisms, which is superior to the reposted CSO predictors. Moreover, we developed the general model based on the integrated data from different species and it showed great universality and effectiveness. We provided the on-line prediction service called DeepCSO that included both species-specific and general models, which is accessible through http://www.bioinfogo.org/DeepCSO.

show abstract

Section: Enhanced Amino Acid Composition (Eaac)mentioning

confidence: 99%

DeepCSO: a deep-learning network approach to predicting Cysteine S-sulphenylation sites

Lyu

Chen

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Khib sites were considered positives whereas the remaining K sites were taken as negatives. We further estimated the potential redundancy of the positive sites by extracting the peptide segment of seven residues with the Khib site in the center and count the number of unique segments [20,25]. The number (9,444) through ten-fold cross-validation ( Fig.…”

Section: Dataset Collectionmentioning

confidence: 99%

“…The DL models included a Gated Recurrent Unit (GRU) model with the word-embedding encoding approach dubbed GRUWE and two CNN models with the one-hot and word-embedding encoding approaches named CNNOH and CNNWE, respectively. Both encoding methods are common in the DL algorithms [20,25].…”

Section: Cnnoh Showed Superior Performancementioning

confidence: 99%

“…Therefore, the general model may be effectually applied to any species. Furthermore, we evaluated the generality of the general CNNOH model using the dataset of S. cerevisiae that contained 1,049 positive and 1,049 negative samples, which may not be enough for build an effective DL predictor [20]. The general model got the AUC value as 0.789, indicating the generality of this model.…”

Section: T Gondii O Sativa and P Patens)mentioning

confidence: 99%

“…Recently, the deep learning (DL) algorithms, as the modern ML architecture, have demonstrated superior prediction performance in the field of bioinformatics, such as the prediction of modification sites on DNA, RNA and proteins [15][16][17][18][19]. We have developed a few DL approaches for the prediction of PTM sites and they all demonstrate their superiority over conventional ML algorithms [20][21][22]. Therefore, we attempted to compare the DL models with the traditional ML models for the prediction of Khib sites.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

DeepKhib: a deep-learning framework for lysine 2-hydroxyisobutyrylation sites prediction

Zhang

Zou

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

As a novel type of post-translational modification, lysine 2-Hydroxyisobutyrylation (Khib) plays an important role in gene transcription and signal transduction. In order to understand its regulatory mechanism, the essential step is the recognition of Khib sites. Thousands of Khib sites have been experimentally verified across five different species. However, there are only a couple traditional machine-learning algorithms developed to predict Khib sites for limited species, lacking a general prediction algorithm. We constructed a deep-learning algorithm based on convolutional neural network with the one-hot encoding approach, dubbed CNNOH. It performs favorably to the traditional machine-learning models and other deep-learning models across different species, in terms of cross-validation and independent test. The area under the ROC curve (AUC) values for CNNOH ranged from 0.82 to 0.87 for different organisms, which is superior to the currently-available Khib predictors. Moreover, we developed the general model based on the integrated data from multiple species and it showed great universality and effectiveness with the AUC values in the range of 0.79 to 0.87. Accordingly, we constructed the on-line prediction tool dubbed DeepKhib for easily identifying Khib sites, which includes both species-specific and general models. DeepKhib is available at http://www.bioinfogo.org/DeepKhib.

show abstract

DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery

Jia

Zhao

2021

Advanced Science

View full text Add to dashboard Cite

Approximately 15% of human cancers are estimated to be attributed to viruses. Virus sequences can be integrated into the host genome, leading to genomic instability and carcinogenesis. Here, a new deep convolutional neural network (CNN) model is developed with attention architecture, namely DeepVISP, for accurately predicting oncogenic virus integration sites (VISs) in the human genome. Using the curated benchmark integration data of three viruses, hepatitis B virus (HBV), human herpesvirus (HPV), and Epstein‐Barr virus (EBV), DeepVISP achieves high accuracy and robust performance for all three viruses through automatically learning informative features and essential genomic positions only from the DNA sequences. In comparison, DeepVISP outperforms conventional machine learning methods by 8.43–34.33% measured by area under curve (AUC) value enhancement in three viruses. Moreover, DeepVISP can decode cis‐regulatory factors that are potentially involved in virus integration and tumorigenesis, such as HOXB7, IKZF1, and LHX6. These findings are supported by multiple lines of evidence in literature. The clustering analysis of the informative motifs reveales that the representative k‐mers in clusters could help guide virus recognition of the host genes. A user‐friendly web server is developed for predicting putative oncogenic VISs in the human genome using DeepVISP.

show abstract

Integration of A Deep Learning Classifier with A Random Forest Approach for Predicting Malonylation Sites

Cited by 71 publications

References 42 publications

DeepCSO: a deep-learning network approach to predicting Cysteine S-sulphenylation sites

DeepCSO: a deep-learning network approach to predicting Cysteine S-sulphenylation sites

DeepKhib: a deep-learning framework for lysine 2-hydroxyisobutyrylation sites prediction

DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery

Contact Info

Product

Resources

About