Despite rapid evolution in the area of microbial natural
products
chemistry, there is currently no open access database containing all
microbially produced natural product structures. Lack of availability
of these data is preventing the implementation of new technologies
in natural products science. Specifically, development of new computational
strategies for compound characterization and identification are being
hampered by the lack of a comprehensive database of known compounds
against which to compare experimental data. The creation of an open
access, community-maintained database of microbial natural product
structures would enable the development of new technologies in natural
products discovery and improve the interoperability of existing natural
products data resources. However, these data are spread unevenly throughout
the historical scientific literature, including both journal articles
and international patents. These documents have no standard format,
are often not digitized as machine readable text, and are not publicly
available. Further, none of these documents have associated structure
files (e.g., MOL, InChI, or SMILES), instead containing images of
structures. This makes extraction and formatting of relevant natural
products data a formidable challenge. Using a combination of manual
curation and automated data mining approaches we have created a database
of microbial natural products (The Natural Products Atlas, ) that includes
24 594 compounds and contains referenced data for structure,
compound names, source organisms, isolation references, total syntheses,
and instances of structural reassignment. This database is accompanied
by an interactive web portal that permits searching by structure,
substructure, and physical properties. The Web site also provides
mechanisms for visualizing natural products chemical space and dashboards
for displaying author and discovery timeline data. These interactive
tools offer a powerful knowledge base for natural products discovery
with a central interface for structure and property-based searching
and presents new viewpoints on structural diversity in natural products.
The Natural Products Atlas has been developed under FAIR principles
(Findable, Accessible, Interoperable, and Reusable) and is integrated
with other emerging natural product databases, including the Minimum
Information About a Biosynthetic Gene Cluster (MIBiG) repository,
and the Global Natural Products Social Molecular Networking (GNPS)
platform. It is designed as a community-supported resource to provide
a central repository for known natural product structures from microorganisms
and is the first comprehensive, open access resource of this type.
It is expected that the Natural Products Atlas will enable the development
of new natural products discovery modalities and accelerate the process
of structural characterization for complex natural products libraries.