Machine-Learning-Based Prediction of Cell-Penetrating Peptides and Their Uptake Efficiency with Improved Accuracy

Manavalan, Balachandran; Subramaniyam, Sathiyamoorthy; Shin, Tae Hwan; Kim, Myeong Ok; Lee, Gwang

doi:10.1021/acs.jproteome.8b00148

Cited by 164 publications

(133 citation statements)

References 64 publications

(137 reference statements)

Supporting

Mentioning

133

Contrasting

Order By: Relevance

“…Twenty‐eight clusters were obtained. Among each cluster, the variable with the smallest ratio of 1‐R (Manavalan et al., ) was selected as a representative variable. Results of principal component analysis were showed in Figure , where values of the 28 variables for the training set and test set as well as the validation set occupied similar chemical space and they were significantly different from each other, indicating that the effectiveness of variable selection.…”

Section: Resultsmentioning

confidence: 99%

“…First, methodological evaluation was carried out to identify appropriate methods. Specifically, eight machine learning algorithms (Ericksen et al, 2017;Friedman, 2002;Manavalan et al, 2018;Siramshetty, Chen, Devarakonda, & Preissner, 2018) including DT, kNN, SVM, RF, ERT, AdaBoost, GBT, and XGBoost were evaluated and comprehensively compared through an application case of discriminating ACC inhibitors from decoys. Then, machine learning methods and traditional structure-based drug discovery were organically combined to construct a robust strategy for the discovery of ACC inhibitors.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A combined drug discovery strategy based on machine learning and molecular docking

et al. 2019

View full text Add to dashboard Cite

Data mining methods based on machine learning play an increasingly important role in drug design and discovery. In the current work, eight machine learning methods including decision trees, k‐Nearest neighbor, support vector machines, random forests, extremely randomized trees, AdaBoost, gradient boosting trees, and XGBoost were evaluated comprehensively through a case study of ACC inhibitor data sets. Internal and external data sets were employed for cross‐validation of the eight machine learning methods. Results showed that the extremely randomized trees model performed best and was adopted as the first step of virtual screening. Together with structure‐based virtual screening in the second step, this combined strategy obtained desirable results. This work indicates that the combination of machine learning methods with traditional structure‐based virtual screening can effectively strengthen the ability in finding potential hits from large compound database for a given target.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A combined drug discovery strategy based on machine learning and molecular docking

et al. 2019

View full text Add to dashboard Cite

show abstract

“…(3-5, 26-28) These previous predictors were mostly trained using physicochemical descriptors, with datasets obtained from non-standardized experiments and containing only canonical residues. (6,(29)(30)(31) Inclusion of chemically diverse unnatural moieties is challenging because such physicochemical descriptors may not be readily available. The ability to encode for unnatural residues, however, would greatly expand the chemical search space, and Peptide sequences are represented as row matrices comprised of residue fingerprints.…”

Section: Developing the Machine Learning Modelmentioning

confidence: 99%

Interpretable Deep Learning for De Novo Design of Cell-Penetrating Abiotic Polymers

Schissel

Mohapatra

Wolfe

et al. 2020

Preprint

View full text Add to dashboard Cite

There are more amino acid permutations within a 40-residue sequence than atoms on Earth. This vast chemical search space hinders the use of human learning to design functional polymers. Here we couple supervised and unsupervised deep learning with highthroughput experimentation to drive the design of high-activity, novel sequences reaching 10 kDa that deliver antisense oligonucleotides to the nucleus of cells. The models, in which natural and unnatural residues are represented as topological fingerprints, decipher and visualize sequenceactivity predictions. The new variants boost antisense activity by 50-fold, are effective in animals, are nontoxic, and can also deliver proteins into the cytosol. Machine learning can discover functional polymers that enhance cellular uptake of biotherapeutics, with significant implications toward developing therapies for currently untreatable diseases.One sentence summary: Deep learning generates de novo large functional abiotic polymers that deliver antisense oligonucleotides to the nucleus. typically occurs by energy-dependent uptake, meaning that endosomal escape presents an additional challenge. While some peptides can efficiently escape the endosome, designing a novel CPP sequence for this task is nearly impossible. In addition to the diversity of physicochemical properties of CPPs, variation in experimental design has resulted in inconsistent and sometimes contradictory datasets.(22) These inconsistencies preclude establishing sequence-activity relationships to guide the design of next-generation CPPs and can be remedied by testing PMO-CPP conjugates in a nuclear delivery-based assay that provides quantitative activity data and selects for sequences that can escape the endosome. In order to uncover CPP design principles for PMO delivery, it is necessary to have a standardized, biologically relevant dataset with which to train machine learning models.

show abstract

“…Results from the compositional analyses suggested that integrating the amino acid preference information would be helpful for differentiating between DHSs and non-DHSs, and so, we used these as input features for ML methods to improve classification. The major advantage of ML methods is their ability to consider multiple features simultaneously, often capturing hidden relationships [16][17][18][19][20][21][22][23].…”

Section: Compositional Analysismentioning

confidence: 99%

“…In the second step of the previous section, we used three different ML-based methods instead of SVM, including, RF, ET, and k-NN. A detailed description of the development of prediction models using these methods was provided in our recent studies [21,23]. For each ML-based method, we generated 33 prediction models using different sets of features, including individual composition, hybrid models, and features based on FIS cut-off.…”

Section: Comparison Of Three Ml-based Models With the Svm-based Modelmentioning

confidence: 99%

DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest

Manavalan

Shin

Lee

2017

Preprint

Self Cite

View full text Add to dashboard Cite

DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin.Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application.Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di-and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html.not peer-reviewed)

show abstract

Machine-Learning-Based Prediction of Cell-Penetrating Peptides and Their Uptake Efficiency with Improved Accuracy

Cited by 164 publications

References 64 publications

A combined drug discovery strategy based on machine learning and molecular docking

A combined drug discovery strategy based on machine learning and molecular docking

Interpretable Deep Learning for De Novo Design of Cell-Penetrating Abiotic Polymers

DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest

Contact Info

Product

Resources

About