Estimation of gene number in mammals is difficult due to the high proportion of noncoding DNA within the nucleus. In this study, we provide a direct measurement of the number of genes in human and mouse. We have taken advantage of the fact that many mammalian genes are associated with CpG islands whose distinctive properties allow their physical separation from bulk DNA. Our results suggest that there are =45,000 CpG islands per haploid genome in humans and 37,000 in the mouse. Sequence comparison confirms that about 20% of the human CpG islands are absent from the homologous mouse genes. Analysis of a selection of genes suggests that both human and mouse are losing CpG islands over evolutionary time due to de novo methylation in the germ line followed by CpG loss through mutation. This process appears to be more rapid in rodents. Combining the number of CpG islands with the proportion of island-associated genes, we estimate that the total number of genes per haploid genome is 40,000 in both organisms. (see refs. 4-6 for reviews). CpG islands constitute a distinctive fraction of the genome because, unlike bulk DNA, they are nonmethylated and contain the dinucleotide CpG at its expected frequency. Another notable property of CpG islands is that their G+C content is significantly higher than that of non-island DNA. This facilitates their identification even in cloned DNA, where the native methylation pattern has been erased (7).In this study, we have exploited these properties to separate CpG islands from the rest of the genome and determineThe publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. their absolute number in human and mouse. A previous approach to the number of CpG islands was carried out in mouse by quantitation of end-labeled restriction fragments generated upon digestion of total genomic DNA with the methyl-sensitive restriction endonuclease Hpa II (8). An approximate figure of30,000 CpG islands per haploid genome was suggested. Our results show significant differences between mouse and human that are relevant to our understanding of the origin and maintenance of CpG islands.Because not all genes have CpG islands, the total number of genes cannot be deduced directly from their number. We have taken the study further by establishing the proportion of genes that are CpG island-associated. Combining the number of CpG islands per genome and the percentage of CpG island-associated genes, we obtain a direct estimate of the total number of genes in human and mouse.
MATERIALS AND METHODSCell Culture Conditions. The human lymphoblastoid PES cell line (9) was grown in RPMI 1640 medium containing 10% tryptose phosphate broth and supplemented with 10% fetal bovine serum. The mouse embryo stem-cell line EFC-1 (10) was grown in Glasgow minimal essential medium supplemented with 10%o fetal bovine serum, 0.1 mM 2-mercaptoethanol, and 1 mM nonessential am...