Five years after the completion of the sequence of the Drosophila melanogaster genome, the number of protein-coding genes it contains remains a matter of debate; the number of computational gene predictions greatly exceeds the number of validated gene annotations. We have assembled a collection of >10,000 gene predictions that do not overlap existing gene annotations and have developed a process for their validation that allows us to efficiently prioritize and experimentally validate predictions from various sources by sequencing RT-PCR products to confirm gene structures. Our data provide experimental evidence for 122 proteincoding genes. Our analyses suggest that the entire collection of predictions contains only Ϸ700 additional protein-coding genes. Although we cannot rule out the discovery of genes with unusual features that make them refractory to existing methods, our results suggest that the D. melanogaster genome contains Ϸ14,000 protein-coding genes.gene number ͉ validation ͉ genome annotation T he total number of protein-coding genes in the Drosophila melanogaster genome remains a subject of debate. Whereas those who curated the D. melanogaster genome concluded that the annotated 13,659 genes in the 3.1 release likely constitute 95% of all protein-coding genes (1), others researchers have concluded that many, possibly thousands, of protein-coding genes remain unannotated (2). Two issues have fueled the debate surrounding gene number in D. melanogaster: the large numbers of computational gene predictions located within intergenic regions and varying standards of experimental evidence for concluding that a gene prediction corresponds to a real gene.As of release 3.1, Ϸ50% of the D. melanogaster genome is intergenic. Running the gene prediction program GENSCAN (3) on every intergenic region in the D. melanogaster genome results in 10,644 gene predictions spread amongst 62 megabases (Mb) of annotation-free sequence. Surely some of these predictions are real, but how many? The best way to answer this question is to subject a representative sample of the gene predictions to some validation procedure.The design and interpretation of experiments intended to assay expression of genes that have been predicted computationally have become controversial. One approach is to rely on hybridization to microarrays or RT-PCR assays for transcript expression (2), with the detection of a product by agarose gel electrophoresis taken as confirmation of the corresponding gene prediction. However, as our results show, unless the diagnostic PCR product includes a splice junction, amplification of residual genomic DNA and detection of unprocessed transcripts may lead to false verifications of gene predictions. It has also been critical to determine the sequence, and not just the size, of the PCR products (4).One way to obtain spliced cDNAs for sequencing is to perform RT-PCR with a 3Ј-oligo(dT) primer and an upstream PCR primer located in the prediction's 5Ј-most exon. The advantages of this approach are that it requires only a ...