2021
DOI: 10.1101/2021.10.05.463235
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

NPOmix: a machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters

Abstract: Microbial natural products, in particular secondary or specialized metabolites, are an important source and inspiration for many pharmaceutical and biotechnological products. However, bioactivity-guided methods widely employed in natural product discovery programs do not explore the full biosynthetic potential of microorganisms, and they usually miss metabolites that are produced at low titer. As a complementary method, the use of genome-based mining in natural products research has facilitated the charting of… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 74 publications
(162 reference statements)
0
5
0
Order By: Relevance
“…Integrative genome-metabolome mining is a complex problem that will require many different solutions and smart ways to integrate those. Combining and streamlining NPLinker with other novel linking methods, such as NPOmix, will be key for advancing this field to understand complex microbial communities and prioritise NP discovery [24]. Other possible routes are to implement additional feature-based scores such as a score based on shared substructures as inferred to be present from the genomic and metabolomic data, the first for example through iPRESTO that finds sub-clusters in BGCs that likely encode for biosynthetic scaffolds or substructures [25], and the latter for example through the use of data-driven approaches that find (MS2LDA) or contain (MotifDB) mass spectral patterns that can be connected to chemical substructures [26,27].…”
Section: Discussionmentioning
confidence: 99%
“…Integrative genome-metabolome mining is a complex problem that will require many different solutions and smart ways to integrate those. Combining and streamlining NPLinker with other novel linking methods, such as NPOmix, will be key for advancing this field to understand complex microbial communities and prioritise NP discovery [24]. Other possible routes are to implement additional feature-based scores such as a score based on shared substructures as inferred to be present from the genomic and metabolomic data, the first for example through iPRESTO that finds sub-clusters in BGCs that likely encode for biosynthetic scaffolds or substructures [25], and the latter for example through the use of data-driven approaches that find (MS2LDA) or contain (MotifDB) mass spectral patterns that can be connected to chemical substructures [26,27].…”
Section: Discussionmentioning
confidence: 99%
“…Also, the integration of LC-MS/MS data with biological metadata is only the first step in integrative data analysis. The integration of LC-MS/MS metabolomics with other types of omics data, such as metagenomics or transcriptomics, would further strengthen prioritization workflows, as demonstrated with the recently published tools NPOmix [47] or NPLinker [48]. While challenging, such methods promise to further facilitate hypothesis formulation and potentially automate molecular feature prioritization.…”
Section: Discussionmentioning
confidence: 99%
“…Integrative genome-metabolome mining is a complex problem that will require many different solutions and smart ways to integrate those. Combining and streamlining NPLinker with other novel linking methods, such as NPOmix, will be key for advancing this eld to understand complex microbial communities and prioritise NP discovery [24]. Other possible routes are to implement additional feature-based scores such as on the basis of shared substructures as inferred to be present from the genomic and metabolomic data, the latter for example through the use of data-driven approaches like MS2LDA and MotifDB [25,26].…”
Section: Discussionmentioning
confidence: 99%
“…The molecular networks listed in the PoDP were used, while antiSMASH 6 was run on the listed 24 and 11 genomes, respectively. CANOPUS was run for the three datasets within NPLinker with the aforementioned default settings, which took around 24 S4). We decided to do this as it did not hamper our effort of nding the validated links in the three datasets.…”
Section: Npclassscore In Nplinkermentioning
confidence: 99%