Ribosomally synthesized and post-translationally modified peptides (RiPPs) constitute a rapidly growing class of natural products with diverse structures and bioactivities. We have developed RiPPMiner, a novel bioinformatics resource for deciphering chemical structures of RiPPs by genome mining. RiPPMiner derives its predictive power from machine learning based classifiers, trained using a well curated database of more than 500 experimentally characterized RiPPs. RiPPMiner uses Support Vector Machine to distinguish RiPP precursors from other small proteins and classify the precursors into 12 sub-classes of RiPPs. For classes like lanthipeptide, cyanobactin, lasso peptide and thiopeptide, RiPPMiner can predict leader cleavage site and complex cross-links between post-translationally modified residues starting from genome sequences. RiPPMiner can identify correct cross-link pattern in a core peptide from among a very large number of combinatorial possibilities. Benchmarking of prediction accuracy of RiPPMiner on a large lanthipeptide dataset indicated high sensitivity, specificity, accuracy and precision. RiPPMiner also provides interfaces for visualization of the chemical structure, downloading of simplified molecular-input line-entry system and searching for RiPPs having similar sequences or chemical structures. The backend database of RiPPMiner provides information about modification system, precursor sequence, leader and core sequence, modified residues, cross-links and gene cluster for more than 500 experimentally characterized RiPPs. RiPPMiner is available at http://www.nii.ac.in/rippminer.html.
Genome guided discovery of novel natural products has been a promising approach for identification of new bioactive compounds. SBSPKS web-server has been a valuable resource for analysis of polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS) gene clusters. We have developed an updated version - SBSPKSv2 which is based on comprehensive analysis of sequence, structure and secondary metabolite chemical structure data from 311 experimentally characterized PKS/NRPS gene clusters with known biosynthetic products. A completely new feature of SBSPKSv2 is the inclusion of features for search in chemical space. It allows the user to compare the chemical structure of a given secondary metabolite to the chemical structures of biosynthetic intermediates and final products. For identification of catalytic domains, SBSPKS now uses profile based searches, which are computationally faster and have high sensitivity. HMM profiles have also been added for a number of new domains and motif information has been used for distinguishing condensation (C), epimerization (E) and cyclization (Cy) domains of NRPS. In summary, the new and updated SBSPKSv2 is a versatile tool for genome mining and analysis of polyketide and non-ribosomal peptide biosynthetic pathways in chemical space. The server is available at: http://www.nii.ac.in/sbspks2.html.
Motivation Even though genome mining tools have successfully identified large numbers of Nonribosomal Peptide Synthetase (NRPS) and Polyketide Synthase (PKS) biosynthetic gene clusters (BGCs) in bacterial genomes, currently no tool can predict the chemical structure of the secondary metabolites biosynthesized by these BGCs. Lack of algorithms for predicting complex macrocyclization patterns of linear PK/NRP biosynthetic intermediates has been the major bottleneck in deciphering the final bioactive chemical structures of PKs/NRPs by genome mining. Results Using a large dataset of known chemical structures of macrocyclized PKs/NRPs, we have developed a machine learning (ML) algorithm for distinguishing the correct macrocyclization pattern of PKs/NRPs from the library of all theoretically possible cyclization patterns. Benchmarking of this ML classifier on completely independent datasets has revealed ROC-AUC and PR-AUC values of 0.82 and 0.81 respectively. This cyclization prediction algorithm has been used to develop SBSPKSv3, a genome mining tool for completely automated prediction of macrocyclized structures of NRPs/PKs. SBSPKSv3 has been extensively benchmarked on a dataset of over 100 BGCs with known PKs/NRPs products. Availability and implementation The macrocyclization prediction pipeline and all the datasets used in this study are freely available at http://www.nii.ac.in/sbspks3.html Supplementary information Supplementary data are available at journal site online.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.