Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.
The current archives of LAMOST multi-object spectrograph contain millions of fully reduced spectra, from which the automatic pipelines have produced catalogues of many parameters of individual objects, including their approximate spectral classification. This is, however, mostly based on the global shape of the whole spectrum and on integral properties of spectra in given bandpasses, namely presence and equivalent width of prominent spectral lines, while for identification of some interesting object types (e.g. Be stars or quasars) the detailed shape of only a few lines is crucial. Here the machine learning is bringing a new methodology capable of improving the reliability of classification of such objects even in boundary cases.We present results of Spark-based semi-supervised machine learning of LAMOST spectra attempting to automatically identify the single and double-peak emission of Hα line typical for Be and B[e] stars. The labelled sample was obtained from archive of 2m Perek telescope at Ondřejov observatory. A simple physical model of spectrograph resolution was used in domain adaptation to LAMOST training domain. The resulting list of candidates contains dozens of Be stars (some are likely yet unknown), but also a bunch of interesting objects resembling spectra of quasars and even blazars, as well as many instrumental artefacts. The verification of a nature of interesting candidates benefited considerably from cross-matching and visualisation in the Virtual Observatory environment.
Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers more accurate BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing tools. We supplemented this with downstream random forest classifiers that accurately predicted BGC product classes and potential chemical activity.Application of DeepBGC to bacterial genomes uncovered previously undetectable BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a significant step forward for in-silico BGC identification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.