2021
DOI: 10.26434/chemrxiv-2021-gxjgc-v2
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Classifying Natural Products from Plants, Fungi or Bacteria using the COCONUT Database and Machine Learning

Abstract: Natural products (NPs) represent one of the most important resources for discovering new drugs. Here we asked whether NP origin can be assigned from their molecular structure in a subset of 60,171 NPs in the recently reported Collection of Open Natural Products (COCONUT) database assigned to plants, fungi, or bacteria. Visualizing this subset in an interactive tree-map (TMAP) calculated using MAP4 (MinHashed atom pair fingerprint) clustered NPs according to their assigned origin (https://tm.gdb.tools/map4/coco… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 60 publications
0
1
0
Order By: Relevance
“…The PubChem database is used by millions of users every month [35]. An example for the usage of the referenced databases is the creation of a classifier that determines whether a Natural Product (NP) originates from funghi, plants, or bacteria based on its chemical structure with data obtained from the COCONUT database [36]. The ZINC database has recently been used for the in silico determination of drug candidates that inhibit the main protease of SARS-CoV-2 [37].…”
Section: The Importance Of Openly Available Resources and Datamentioning
confidence: 99%
“…The PubChem database is used by millions of users every month [35]. An example for the usage of the referenced databases is the creation of a classifier that determines whether a Natural Product (NP) originates from funghi, plants, or bacteria based on its chemical structure with data obtained from the COCONUT database [36]. The ZINC database has recently been used for the in silico determination of drug candidates that inhibit the main protease of SARS-CoV-2 [37].…”
Section: The Importance Of Openly Available Resources and Datamentioning
confidence: 99%