Alignment-Free Antimicrobial Peptide Predictors: Improving Performance by a Thorough Analysis of the Largest Available Data Set

Pinacho-Castellanos, Sergio A.; García-Jacas, César R.; Gilson, Michael K.; Brizuela, Carlos A.

doi:10.1021/acs.jcim.1c00251

Cited by 32 publications

(71 citation statements)

References 104 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“… Target or calibration databases: We considered five databases of APPs and non-APPs reported in ref. 30 There were different balanced and unbalanced datasets stored in five FASTA files with thousands of labeled APPs and non-APPs (SI1C-G).…”

Section: Description Of Modelsmentioning

confidence: 99%

“…To assess the relative performance of the mQSSMs, we used the five data sets of APPs and non-APPs recently provided in ref. 30 These datasets were obtained from starPepDB, whose description can be found in https://biocom-ampdiscover.cicese.mx/dataset. Each set of queries and similarity thresholds were wrapped into a calibration algorithm, comprising a modified virtual screening simulation technique.…”

Section: Selection Of the Best Of Models And Comparisons With ML Apps Prediction Serversmentioning

confidence: 99%

“…Each set of queries and similarity thresholds were wrapped into a calibration algorithm, comprising a modified virtual screening simulation technique. 30 In these models, we used just the queries' subset of APPs as the multi-query calibration group, while the active and inactive subsets were the target datasets. The prediction ensemble, composed of similarity scores of each peptide D in the target dataset with each query Q, was ordered with the MAX-SIM multi-classifier.…”

Section: Selection Of the Best Of Models And Comparisons With ML Apps Prediction Serversmentioning

confidence: 99%

“…62 After applying these statistical tests, we obtained the best mQSSM. A second comparison was carried out to compare our best similarity searching models with ML-based models reported in the literature for APP prediction 30,31 by using the same five calibration datasets.…”

Section: Selection Of the Best Of Models And Comparisons With ML Apps Prediction Serversmentioning

confidence: 99%

“…29 It is worth mentioning this is the first time we are exploring the chemical space from starPepDB to retrieve valuable information derived from AMPs. In addition, we evaluated the multi-query similarity searching models (mQSSMs) performance with five benchmarking data sets of APP/non-APPs, and we compared these results with performance metrics of ML APPs prediction servers AMPDiscover (https://biocom-ampdiscover.cicese.mx) 30 and AMPFun (http://fdblab.csie.ncu.edu.tw/AMPfun/index.html). 31…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Exploring the Chemical Space of Antiparasitic Peptides and Discovery of New Promising Leads through a Novel Approach based on Network Science and Similarity Searching

Ayala-Ruano

Marrero-Ponce

Aguilera‐Mendoza

et al. 2021

Preprint

View full text Add to dashboard Cite

Antimicrobial peptides (AMPs) are small bioactive chemicals that have appeared as promising compounds to treat a wide range of diseases. The effectiveness of AMPs resides in the wide range of mechanisms they can use for both killing microbes and modulating immune responses. However, the AMPs’ chemical space (AMPCS) is huge, it is estimated that there exist more than 1065 unique sequences of peptides with 50 residues or fewer, which represent a big challenge for the discovery of new promising sequences and the identification of common features, motifs, or relevant biological functions shared by these peptides. Therefore, we present a new approach based on network science and similarity searching to discover new potential AMPs, specifically antiparasitic peptides (APPs). We have taken advantage of network-based representation of APPs’ chemical space (APPCS) to retrieve valuable information, using three types of networks: chemical space (CSN), half-space proximal (HSPN), and metadata (METN). Some centrality measures were applied to identify the most important and non-redundant nodes, and these peptides were taken as queries (Qs) against the graph database starPepDB to discover new potential APPs with similarity searching by group fusion (MAX-SIM rule) models. We evaluated the multi-query similarity searching models (mQSSMs) performance with five benchmarking data sets of APP/non-APPs. It can be stated that the predictions performed by the best mQSSMs present a strong-to-very strong predictive agreement since their external Matthews correlation coefficient (MCC) values ranged from 0.834 to 0.965. Outstanding outcomes were attained by the mQSSM with 219 Qs from both networks CSN and HSPN (219Q_0.5_HB-HC-Singletons_CSN-HSPN) and by using 0.5 as similarity threshold, with MCC values greater than 0.85 in external datasets. Then, we compared the performance metrics of our mQSSMs with APPs prediction servers AMPDiscover and AMPFun. The model proposed in this report outperformed the machine learning approaches with statistically significant differences, showing the enormous potential of this method. After applying our method and additional filters, we proposed 95 repurposed leads as potential APPs, which have not been associated with this activity until now. In addition, we explored sequence similarities and motifs shared by these peptides, which can serve as templates for searching and designing new promising APPs. The analyses show that the similarity models proposed in this study could contribute to identifying APPs with high effectivity and reliability. Our models and pipeline are freely available through the starPep toolbox software at http://mobiosd-hub.com/starpep.

show abstract

Section: Description Of Modelsmentioning

confidence: 99%

Section: Selection Of the Best Of Models And Comparisons With ML Apps Prediction Serversmentioning

confidence: 99%

Section: Selection Of the Best Of Models And Comparisons With ML Apps Prediction Serversmentioning

confidence: 99%

Section: Selection Of the Best Of Models And Comparisons With ML Apps Prediction Serversmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Exploring the Chemical Space of Antiparasitic Peptides and Discovery of New Promising Leads through a Novel Approach based on Network Science and Similarity Searching

Ayala-Ruano

Marrero-Ponce

Aguilera‐Mendoza

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

Sequential Properties Representation Scheme for Recurrent Neural Network-Based Prediction of Therapeutic Peptides

Otović

Njirjak

Kalafatović

et al. 2022

J. Chem. Inf. Model.

View full text Add to dashboard Cite

The discovery of therapeutic peptides is often accelerated by means of virtual screening supported by machine learning-based predictive models. The predictive performance of such models is sensitive to the choice of data and its representation scheme. While the peptide physicochemical and compositional representations fail to distinguish sequence permutations, the amino acid arrangement within the sequence lacks the important information contained in physicochemical, conformational, topological, and geometrical properties. In this paper, we propose a solution to the identified information gap by implementing a hybrid scheme that complements the best traits from both approaches with the aim of predicting antimicrobial and antiviral activities based on experimental data from DRAMP 2.0, AVPdb, and Uniprot data repositories. Using the Friedman test of statistical significance, we compared our hybrid, sequential properties approach to peptide properties, one-hot vector encoding, and word embedding schemes in the 10-fold cross-validation setting, with respect to the F1 score, Matthews correlation coefficient, geometric mean, recall, and precision evaluation metrics. Moreover, the sequence modeling neural network was employed to gain insight into the synergic effect of both properties-and amino acid order-based predictions. The results suggest that sequential properties significantly (P < 0.01) surpasses the aforementioned state-of-the-art representation schemes. This makes it a strong candidate for increasing the predictive power of screening methods based on machine learning, applicable to any category of peptides.

show abstract

Network Science and Group Fusion Similarity-Based Searching to Explore the Chemical Space of Antiparasitic Peptides

et al. 2022

View full text Add to dashboard Cite

Antimicrobial peptides (AMPs) have appeared as promising compounds to treat a wide range of diseases. Their clinical potentialities reside in the wide range of mechanisms they can use for both killing microbes and modulating immune responses. However, the hugeness of the AMPs' chemical space (AMPCS), represented by more than 10 65 unique sequences, has represented a big challenge for the discovery of new promising therapeutic peptides and for the identification of common structural motifs. Here, we introduce network science and a similarity searching approach to discover new promising AMPs, specifically antiparasitic peptides (APPs). We exploited the network-based representation of APPs' chemical space (APPCS) to retrieve valuable information by using three network types: chemical space (CSN), half-space proximal (HSPN), and metadata (METN). Some centrality measures were applied to identify in each network the most important and nonredundant peptides. Then, these central peptides were considered as queries (Qs) in group fusion similarity-based searches against a comprehensive collection of known AMPs, stored in the graph database StarPepDB, to propose new potential APPs. The performance of the resulting multiquery similarity-based search models (mQSSMs) was evaluated in five benchmarking data sets of APP/non-APPs. The predictions performed by the best mQSSM showed a strong-tovery-strong performance since their external Matthews correlation coefficient (MCC) values ranged from 0.834 to 0.965. Outstanding MCC values (>0.85) were attained by the mQSSM with 219 Qs from both networks CSN and HSPN with 0.5 as similarity threshold in external data sets. Then, the performance of our best mQSSM was compared with the APPs prediction servers AMPDiscover and AMPFun. The proposed model showed its relevance by outperforming state-of-the-art machine learning models to predict APPs. After applying the best mQSSM and additional filters on the non-APP space from StarPepDB, 95 AMPs were repurposed as potential APP hits. Due to the high sequence diversity of these peptides, different computational approaches were applied to identify relevant motifs for searching and designing new APPs. Lastly, we identified 11 promising APP lead candidates by using our best mQSSMs together with diversity-based network analyses, and 24 web servers for activity/toxicity and drug-like properties. These results support that network-based similarity searches can be an effective and reliable strategy to identify APPs. The proposed models and pipeline are freely available through the StarPep toolbox software at http://mobiosd-hub.com/starpep.

show abstract

Alignment-Free Antimicrobial Peptide Predictors: Improving Performance by a Thorough Analysis of the Largest Available Data Set

Cited by 32 publications

References 104 publications

Exploring the Chemical Space of Antiparasitic Peptides and Discovery of New Promising Leads through a Novel Approach based on Network Science and Similarity Searching

Exploring the Chemical Space of Antiparasitic Peptides and Discovery of New Promising Leads through a Novel Approach based on Network Science and Similarity Searching

Sequential Properties Representation Scheme for Recurrent Neural Network-Based Prediction of Therapeutic Peptides

Network Science and Group Fusion Similarity-Based Searching to Explore the Chemical Space of Antiparasitic Peptides

Contact Info

Product

Resources

About