13The growing number of sequenced genomes enables the study of secondary metabolite 14 biosynthetic gene clusters (BGC) in phyla beyond well-studied soil bacteria. We mined 2627 15 enterobacterial genomes to detect 8604 BGCs, including nonribosomal peptide synthetases, 16 siderophores, polyketide-nonribosomal peptide hybrids, and 60 other BGC types, with an 17 average of around 3.3 BGCs per genome. These BGCs represented 212 distinct BGC families, of 18 which only 20 have associated products in the MIBiG standard database with functions such as 19 siderophores, antibiotics, and genotoxins. Pangenome analysis identified genes associated with a 20 specific BGC encoding for colon cancer-related colibactin. In one example, we associated genes 21 involved in the type VI secretion system with the presence of a colibactin BGC in Escherichia. 22This richness of BGCs in enterobacteria opens up the possibility to discover novel secondary 23 metabolites, their physiological roles and provides a guide to identify and understand PKS 24 associated gene sets. 25
Main 26Secondary metabolites produced by a range of microorganisms display medicinally and 27 industrially important properties, as well as mediate microbe-host and microbe-microbe 28 interactions. Secondary metabolite biosynthesis often involves mega-enzymes such as polyketide 29 synthases (PKS) and non-ribosomal peptide synthetases (NRPS) that are encoded by large 30 biosynthetic gene clusters (BGCs). Recent advances in genome sequencing technology and 31 genome mining tools revealed an unexplored richness and diversity of BGCs encoding secondary 32 2 metabolites 1-4 . In addition, the availability of a large number of genomes from the same species 33 allowed for pangenome analysis revealing intra-species diversity, such as metabolic capabilities 34 5,6 . The focus of many genome mining based studies have been well-established secondary 35 metabolite producers, such as bacilli, actinobacteria, or myxobacteria 7,8 . In comparison with many 36 of the popular secondary metabolite producers, Escherichia coli or other enterobacteria have larger 37 availability of sequenced genomes, higher quality of genome annotations, comprehensive curated 38 databases, and extensive tools for data analysis. With the exception of Photorhabdus, Xenorhabdus 39 and related genera, which are known to produce a diverse range of secondary metabolites 9,10 , 40 enterobacteria are known to produce few secondary metabolites. These include metal ion chelators 41 like enterobactin, yersiniabactin 11,12 , colon cancer-related genotoxin colibactin 13,14 , antibiotic 42 althiomycin 15 , red pigment prodigiosin and biosurfactant serrawettin W1 16 . Here, we aim to 43