A recently discovered wild-type strain, Clostridium beijerinckii G117, is unique in producing butanol and acetone but negligible amounts of ethanol, unlike previously identified acetone-butanol-ethanol (ABE)-generating microbes. Here we report the draft genome sequence of strain G117 (5,806,675 bp; GC content, 29.7%) and the novel findings obtained from its genome annotations.T he solventogenic Clostridium species have attracted renewed attention in the scientific field during the past few years due to the search for sustainable sources of energy (5). Among solventogenic microbes, the Gram-positive anaerobe Clostridium beijerinckii NCIMB 8052 is capable of utilizing pentose and hexose sugars to produce acetone, butanol, and ethanol (ABE) without the "glucose repression" effect, which gives it an attractive advantage over another well-known solventogenic organism, Clostridium acetobutylicum (10). Recently, a newly discovered isolate, Clostridium beijerinckii G117, was distinguished as generating acetone and butanol (AB) but negligible ethanol from fermentation of glucose (1a). The 16S rRNA gene of strain G117 shows 99% identity to that of C. beijerinckii NCIMB 8052. The genome sequencing data for C. beijerinckii NCIMB 8052 will facilitate our understanding of the "omics" (e.g., genomics, transcriptomics, and metabolomics) of solventogenic microbes (5, 9). To further our understanding of solventogenesis in this organism, we present a draft genome sequence of C. beijerinckii G117.The genome of C. beijerinckii G117 was sequenced by a wholegenome shotgun strategy using a high-throughput Illumina HiSeq 2000 at the Beijing Genomics Institute (BGI, Shenzhen, China). A total of 1,766,892 reads with a 296-bp insert size, counting up to 523 Mbp, were obtained, providing 90-fold coverage. Genome sequences were assembled by using the SOAPdenovo program (version 1.05) (6), resulting in 178 contigs with an N 50 of 95,461 bp and a total length of 5,806,675 bp for the whole genome. These contigs were assembled into 89 scaffolds with a maximum length of 271,884 bp. Putative protein coding sequences were identified by Glimmer (version 3.02) (2) and analyzed by BLASTP. The functions of predicted protein-coding genes were annotated through comparisons with Kyoto Encyclopedia of Genes and Genomes (KEGG) (3), Clusters of Orthologous Groups (COG) (8), and Swiss-Prot (1) databases. tRNA and rRNA were annotated with tRNAscan-SE 1.21 (7) and rRNAmmer 1.2 (4), respectively.The draft genome includes 5,806,675 bp with a low GC content of 29.7%, and the average nucleotide identity between strain G117 and NCIMB 8052 was determined to be 97.39%. The genome contains 5,262 predicted genes, 5,111 protein-coding sequences (CDSs), 54 tRNA genes, and 20 rRNAs (including 11 5S rRNAs, 3 16S rRNAs, and 6 23S rRNAs). Among the predicted CDSs, 2,687, 2,937, and 2,102 proteins were functionally annotated in the COG, KEGG, and Swiss-Prot databases, respectively. The annotations indicate that a total of 198 CDSs are involved in energy production and conversi...