2014
DOI: 10.7287/peerj.preprints.554v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

Abstract: Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. While 25 this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

1
561
0
1

Year Published

2016
2016
2021
2021

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 427 publications
(563 citation statements)
references
References 14 publications
1
561
0
1
Order By: Relevance
“…We predominantly considered environmental and nonhuman gastrointestinal samples in order to focus on metagenomes likely to contain microbial populations from under-sampled lineages (Supplementary Table 1). The completeness and contamination of each MAG was estimated from the presence and absence of lineage-specific genes expected to be ubiquitous and single copy 25 , and these estimates, along with assembly statistics, used to identify genomes suitable for further study. A total of 64,295 MAGs were obtained, of which 7,903 (7,280 bacterial and 623 archaeal)…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…We predominantly considered environmental and nonhuman gastrointestinal samples in order to focus on metagenomes likely to contain microbial populations from under-sampled lineages (Supplementary Table 1). The completeness and contamination of each MAG was estimated from the presence and absence of lineage-specific genes expected to be ubiquitous and single copy 25 , and these estimates, along with assembly statistics, used to identify genomes suitable for further study. A total of 64,295 MAGs were obtained, of which 7,903 (7,280 bacterial and 623 archaeal)…”
Section: Resultsmentioning
confidence: 99%
“…The correlation between estimated genome completeness and identified tRNAs was positive but weak ( Supplementary Fig. 2) as tRNAs are regularly present in multiple copies and often collocated in a genome, making them poor markers for robustly estimating completeness 25,38 .Taxonomic distribution of UBA genomes. The phylogenetic relationships of the UBA genomes were determined across bacterial and archaeal trees inferred from three concatenated protein sets: (1) a syntenic block of 16 ribosomal proteins (rp1) recently used to infer genome-based phylogenies 10,30 (Supplementary Table 4), (2) 23 ribosomal proteins (rp2) previously tested for lateral gene transfer 4 (Supplementary Table 5), and (3) 120 bacterial (bac120) and 122 archaeal (ar122) proteins we have identified as being suitable for phylogenetic inference (Supplementary Tables 6 and 7).…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…This list of single-copy genes has been used to estimate genome completeness in several recent studies 13,63 . When we analysed the genome using another metric of genome completeness (checkM 64 ), the results suggested that the genome was 80% complete with 4% contamination, a level categorized as a 'substantially complete draft with low contamination'. This level of completeness is similar to several other recent genomes assembled from metagenomes 65,66 .…”
Section: Methodsmentioning
confidence: 99%
“…This level of completeness is similar to several other recent genomes assembled from metagenomes 65,66 . However, because checkM relies on lineage-specific marker genes, the completeness of genomes without lineage representation can often be underestimated 64 . As there is only one complete genome for the entire class Spartobacteria (C. flavus), the checkM genome completeness estimate for Ca.…”
Section: Methodsmentioning
confidence: 99%