This
report describes the first application of the novel NMR-based
machine learning tool “Small Molecule Accurate Recognition
Technology” (SMART 2.0) for mixture analysis and subsequent
accelerated discovery and characterization of new natural products.
The concept was applied to the extract of a filamentous marine cyanobacterium
known to be a prolific producer of cytotoxic natural products. This
environmental Symploca extract was roughly fractionated,
and then prioritized and guided by cancer cell cytotoxicity, NMR-based
SMART 2.0, and MS2-based molecular networking. This led
to the isolation and rapid identification of a new chimeric swinholide-like
macrolide, symplocolide A, as well as the annotation of swinholide
A, samholides A–I, and several new derivatives. The planar
structure of symplocolide A was confirmed to be a structural hybrid
between swinholide A and luminaolide B by 1D/2D NMR and LC-MS2 analysis. A second example applies SMART 2.0 to the characterization
of structurally novel cyclic peptides, and compares this approach
to the recently appearing “atomic sort” method. This
study exemplifies the revolutionary potential of combined traditional
and deep learning-assisted analytical approaches to overcome longstanding
challenges in natural products drug discovery.
Various algorithms comparing 2D NMR spectra have been explored for their ability to dereplicate natural products as well as determine molecular structures. However, spectroscopic artefacts, solvent effects, and the interactive effect of functional group(s) on chemical shifts combine to hinder their effectiveness. Here, we leveraged Non-Uniform Sampling (NUS) 2D NMR techniques and deep Convolutional Neural Networks (CNNs) to create a tool, SMART, that can assist in natural products discovery efforts. First, an NUS heteronuclear single quantum coherence (HSQC) NMR pulse sequence was adapted to a state-of-the-art nuclear magnetic resonance (NMR) instrument, and data reconstruction methods were optimized, and second, a deep CNN with contrastive loss was trained on a database containing over 2,054 HSQC spectra as the training set. To demonstrate the utility of SMART, several newly isolated compounds were automatically located with their known analogues in the embedded clustering space, thereby streamlining the discovery pipeline for new natural products.
Bastimolide A (1), a polyhydroxy macrolide with a 40-membered ring, was isolated from a new genus of the tropical marine cyanobacterium Okeania hirsuta. This novel macrolide was defined by spectroscopy and chemical reactions to possess one 1,3-diol, one 1,3,5-triol, six 1,5-diols, and one tert-butyl group; however, the relationships of these moieties to one another were obscured by a highly degenerate (1)H NMR spectrum. Its complete structure and absolute configuration were therefore unambiguously determined by X-ray diffraction analysis of the nona-p-nitrobenzoate derivative (1d). Pure bastimolide A (1) showed potent antimalarial activity against four resistant strains of Plasmodium falciparum with IC50 values between 80 and 270 nM, although with some toxicity to the control Vero cells (IC50 = 2.1 μM), and thus represents a potentially promising lead for antimalarial drug discovery. Moreover, rigorous establishment of its molecular arrangement gives fresh insight into the structures and biosynthesis of cyanobacterial polyhydroxymacrolides.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.