2021
DOI: 10.1101/2021.05.03.442509
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Accuratede novoidentification of biosynthetic gene clusters with GECCO

Abstract: Biosynthetic gene clusters (BGCs) are enticing targets for (meta)genomic mining efforts, as they may encode novel, specialized metabolites with potential uses in medicine and biotechnology. Here, we describe GECCO (GEne Cluster prediction with COnditional random fields; https://gecco.embl.de), a high-precision, scalable method for identifying novel BGCs in (meta)genomic data using conditional random fields (CRFs). Based on an extensive evaluation of de novo BGC prediction, we found GECCO to be more accurate an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 35 publications
(42 citation statements)
references
References 44 publications
0
30
0
Order By: Relevance
“…Further, an improved understanding of the relationship between intra-taxa genomic diversity and BGC content novelty at the baseresolution can also highlight which sub-lineages or sub-clades bear the largest reservoir of untapped secondary metabolic potential. Finally, we posit that identifying evolutionary trends of BGCs detected by highly-reliable rule-based approaches, such as antiSMASH 8 , can be used to assess the validity of BGC predictions from newly developed machine learning approaches, such as DeepBGC 11 or GECCO 65 , which we now support usage of in lsaBGC.…”
Section: Discussionmentioning
confidence: 78%
“…Further, an improved understanding of the relationship between intra-taxa genomic diversity and BGC content novelty at the baseresolution can also highlight which sub-lineages or sub-clades bear the largest reservoir of untapped secondary metabolic potential. Finally, we posit that identifying evolutionary trends of BGCs detected by highly-reliable rule-based approaches, such as antiSMASH 8 , can be used to assess the validity of BGC predictions from newly developed machine learning approaches, such as DeepBGC 11 or GECCO 65 , which we now support usage of in lsaBGC.…”
Section: Discussionmentioning
confidence: 78%
“…Pyrodigal has already been used as the implementation for the initial ORF finding stage in several domains, including biosynthetic gene cluster prediction (Carroll et al, 2021), prophage identification (Sirén et al, 2021;Turkington et al, 2021), and pangenome analysis (Hernández et al, 2021).…”
Section: Statement Of Needmentioning
confidence: 99%
“…Analyses of genomic sequences provided a hidden glance into organisms’ potential for producing bioactive NPs. Modern strategies in genome mining include in silico methods for BGC identification to facilitate the discovery of novel NPs [ 91 ]. For genome mining, different types of sequence data (e.g., genomics, transcriptomics, metabolomics, proteomics, epigenomics, and multi-omics) [ 10 , 40 , 54 ] and numerous bioinformatics tools are applied [ 67 , 92 ].…”
Section: Genome-mining Toolsmentioning
confidence: 99%
“…Genome mining search for novel antibiotics, Antibiotic-Resistant Target Seeker (ARTS), is available at (accessed on 12 June 2022) [ 121 ]. Recently, a new machine learning approach, GECCO (GEne Cluster prediction with COnditional random fields; ; accessed on 12 June 2022), allowed much higher identification of de novo BGCs from metagenomics data, confirming the important link for prediction between protein domain and secondary metabolites [ 91 ]. An overview of different bioinformatics tools applied for genome mining and useful in the discovery of BCGs and related secondary metabolites are presented in Table 2 .…”
Section: Genome-mining Toolsmentioning
confidence: 99%