The vast majority of microbial life remains uncatalogued due to the inability to cultivate these organisms in the laboratory. This "microbial dark matter" represents a substantial portion of the tree of life and of the populations that contribute to chemical cycling in many ecosystems. In this work, we leveraged an existing single-cell genomic data set representing the candidate bacterial phylum "Calescamantes" (EM19) to calibrate machine learning algorithms and define metagenomic bins directly from pyrosequencing reads derived from Great Boiling Spring in the U.S. Great Basin. Compared to other assembly-based methods, taxonomic binning with a read-based machine learning approach yielded final assemblies with the highest predicted genome completeness of any method tested. Read-first binning subsequently was used to extract Calescamantes bins from all metagenomes with abundant Calescamantes populations, including metagenomes from Octopus Spring and Bison Pool in Yellowstone National Park and Gongxiaoshe Spring in Yunnan Province, China. Metabolic reconstruction suggests that Calescamantes are heterotrophic, facultative anaerobes, which can utilize oxidized nitrogen sources as terminal electron acceptors for respiration in the absence of oxygen and use proteins as their primary carbon source. Despite their phylogenetic divergence, the geographically separate Calescamantes populations were highly similar in their predicted metabolic capabilities and core gene content, respiring O 2 , or oxidized nitrogen species for energy conservation in distant but chemically similar hot springs.T he vast majority of the diversity of microbial life on Earth remains undiscovered; the core metabolisms of yet-uncultivated species, interguild interactions within natural and managed ecosystems, and the contributions of microbial populations to the geochemistry of the environment remain poorly understood (1, 2). There are currently over 60 bacterial and archaeal phylumlevel groups that have been observed through the use of 16S rRNA gene sequencing and phylogenetics, with over half containing no cultivated representatives (3). This so-called microbial dark matter comprises a substantial proportion of the tree of life and of microbial communities that likely play significant roles in biogeochemical cycles in a variety of environments (4-9).Metagenomic analyses of low-diversity microbial communities have yielded robust, near-complete genomic assemblies representative of the abundant populations, expanding the knowledge of the metabolic potential of predominant organisms in these ecosystems (10-14). Nucleotide word frequencies calculated from metagenomic contiguous assembled sequences (contigs) have been used to separate population-specific clusters, or "bins," from the community DNA pool (15-17), which has greatly advanced our understanding of the genomic diversity in natural environments and how populations differ between chemically distinct environments and along environmental gradients (18,19). Even using modern sequencing techniques, ...