Determination of the DNA G+C content of prokaryotic genomes using traditional methods is time-consuming and results may vary from laboratory to laboratory, depending on the technique used. We explored the possibility of extrapolating the genomic DNA G+C content of prokaryotes from gene sequences. For this, 127 universally conserved genes were studied from 50 prokaryotic genomes in the Clusters of Orthologous Groups database. Of these, 57 genes were present as a single copy in the genomes of 157 different prokaryote species available in GenBank. There was a strong correlation [coefficient of determination (r 2 ) >95 %] between the DNA G+C contents of 20 genes and their corresponding genomes. For each of the 157 prokaryotic genomes studied, the DNA G+C content of the 20 genes was used to determine a 'calculated' genome DNA G+C content (CGC) and this value was compared with the 'real' genome DNA G+C content (RGC). In order to select the most suitable gene for the determination of CGC values, we compared the r 2 and median mol% difference between CGC and RGC as well as the sensitivity of each gene to provide CGC values for prokaryotic genomes that differ by less than 5 mol% from their RGC. The highly conserved ftsY gene (median size 1144 nucleotides), a vertically inherited member of the GTPase superfamily, showed the highest r 2 value of 0?98, the smallest median mol% difference between CGC and RGC of 1?06 and a sensitivity of 100 %. Using ftsY DNA G+C content values, the CGC values of 100 genomes not included in the calculation of r 2 differed by less than 5 mol% from their RGC values. These data suggest that the genomic DNA G+C content of prokaryotes may be estimated easily and reliably from the ftsY gene sequence.
INTRODUCTIONThe current taxonomic classification of prokaryotes is based on polyphasic taxonomy (Vandamme et al., 1996). This approach combines the genomic and phenotypic characteristics of a strain. The minimum amount of genomic information required for the description of a novel bacterial species must include its phylogenetic classification, DNA-DNA relatedness and the mol% G+C content of DNA (Stackebrandt et al., 2002). Previously, it has been suggested that micro-organisms showing more than 10 mol% difference in DNA G+C contents might not belong to the same genus and that 5 mol% is the common range found within a species (Goodfellow et al., 1997). Of the various methods available for the determination of DNA G+C content (De Ley, 1970;Ko et al., 1977;Mesbah & Whitman, 1989;Owen et al., 1969;Schildkraut et al., 1962;Xu et al., 2000), the thermal denaturation temperature (T m ) method is most commonly used . However, thermal denaturation is a time-consuming method that requires a large amount of DNA and lacks intra-and inter-laboratory reproducibility, and the T m is calculated using a formula proposed by Mandel et al. (1970) which is not suitable for prokaryotes with very low or elevated DNA G+C contents (Ezaki et al., 1990).In our laboratory, we have been studying the rpoB gene, encoding the b-subun...