2017
DOI: 10.1038/nbt.3886
|View full text |Cite|
|
Sign up to set email alerts
|

1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life

Abstract: We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space. These genomes double the number of existing type strains and expand their overall phylogenetic diversity by 25%. Comparative analyses with previously available finished and draft genomes reveal a 10.5% increase in novel protein families as a function of phylogenetic diversity. The GEBA genomes recruit 25 million previ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
176
0
1

Year Published

2017
2017
2022
2022

Publication Types

Select...
7
1
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 217 publications
(179 citation statements)
references
References 73 publications
(69 reference statements)
2
176
0
1
Order By: Relevance
“…Consequently, current genome repositories are not representative of the microbial diversity known from 16S rRNA gene surveys 4 . Concerted efforts are being made to address this limitation by target ing phylogenetically distinct microorganisms for cultivation [5][6][7] and single-cell sequencing 4,8 . Although these approaches continue to provide valuable reference genomes, the former is restricted to microorganisms amenable to cultivation and the latter is hampered by technical challenges and the need for specialised equipment 9 .…”
mentioning
confidence: 99%
“…Consequently, current genome repositories are not representative of the microbial diversity known from 16S rRNA gene surveys 4 . Concerted efforts are being made to address this limitation by target ing phylogenetically distinct microorganisms for cultivation [5][6][7] and single-cell sequencing 4,8 . Although these approaches continue to provide valuable reference genomes, the former is restricted to microorganisms amenable to cultivation and the latter is hampered by technical challenges and the need for specialised equipment 9 .…”
mentioning
confidence: 99%
“…The high‐throughput (meta)genomics of saprophytic microorganisms of ecological and biotechnological interest has generated a vast abundance of sequence data on the diverse suites of CAZymes and other proteins that drive biomass degradation (Medie et al ., ; El Kaoutari et al ., ; Kunath et al ., ; Mukherjee et al ., ). A common observation is that most saprophytes encode multiple homologs from individual CAZyme families within their genomes, and even the most exacting biochemical analysis performed in vitro can fail to reveal enzyme performance in complex, biologically relevant situations (Cartmell et al ., ; Naas et al ., ; Zhang et al ., ), particularly as all physiological and regulatory context is removed (Forsberg et al ., ; Nelson et al ., ).…”
Section: Discussionmentioning
confidence: 97%
“…Numerous projects are contributing to the population of whole‐genome sequences in databases such as the i5K initiative that aims to sequence 5000 arthropod genomes (i5K Consortium, ), the 1000 fungal genome project (approximately 800 fungal genomes are currently available through the U.S. Department of Energy's Joint Genome Institute MycoCosm portal) (Grigoriev et al., ), the GIGA project targeting 7000 noninsect and non‐nematode invertebrates (mostly marine taxa) for sequencing (GIGA Community of Scientists, ), the Genome 10K project that aims to sequence one individual from every vertebrate genus (Koepfli, Paten, & O'Brien, ), the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative that sequenced and released 1000 bacterial and archaeal genomes (Mukherjee et al., ), as well as other projects targeting plant and crop genomes (Li, Wang, & Zeigler, ). All of these data are essential resources for the further development of molecular primers, probes and, in some cases, identification of eDNA sequences generated by other genomic methods discussed in this review.…”
Section: Whole‐genome Sequencingmentioning
confidence: 99%