We introduce cap analysis gene expression (CAGE), which is based on preparation and sequencing of concatamers of DNA tags deriving from the initial 20 nucleotides from 5 end mRNAs. CAGE allows high-throughout gene expression analysis and the profiling of transcriptional start points (TSP), including promoter usage analysis. By analyzing four libraries (brain, cortex, hippocampus, and cerebellum), we redefined more accurately the TSPs of 11-27% of the analyzed transcriptional units that were hit. The frequency of CAGE tags correlates well with results from other analyses, such as serial analysis of gene expression, and furthermore maps the TSPs more accurately, including in tissue-specific cases. The highthroughput nature of this technology paves the way for understanding gene networks via correlation of promoter usage and gene transcriptional factor expression.full-length cDNA ͉ transcriptome ͉ sequencing ͉ cap-trapping E ven the comparison of mammalian genome draft sequences (1) has left many unanswered questions with regard to the exact identification of expressed genes, their promoter elements, and the network of promoter͞transcriptional factor usage that underlies gene expression. Partial identification of the promoter sites has been provided by gene discovery programs based on the sequencing of full-length cDNA libraries (2-4); these have been instrumental in identifying the sequence of promoter regions, including potentially different promoters (5). Several thousand promoters can be determined by sequencing 5Ј ends from full-length cDNA libraries and mapping the sequences to the genome, thus determining which correspond to coding and regulatory regions, respectively. These analyses can produce statistics on transcriptional start sites derived from large numbers of 5Ј end sequences. However, these methods lack the throughput to provide significantly abundant data for intermediately͞lowly expressed genes, chiefly because the comprehensive sequencing of cDNA libraries is prohibitively expensive. On the other hand, microarrays for high-throughput tissue expression analysis do exist (6), but these cannot determine transcription starting points and therefore cannot be used to accurately identify the cis regulatory elements that will be essential for computing gene networks. Another limitation of microarrays is that the only genes͞transcripts that can be studied are those that have already been identified by the sequencing, which is far from completion (2). Serial analysis of gene expression (SAGE) allows partial sequence information of short tags at the 3Ј ends of mRNAs (7) to be obtained. Although the information is partial, it is amenable to relatively cheap high-throughput digital data collection, because it is based on the cloning and subsequent sequencing of concatamers of short DNA fragments derived from 3Ј ends of multiple mRNAs (http:͞͞cgap.nci.nih.gov͞ SAGE). This method was further improved on by Long-SAGE, which allows for the cloning of 20-nt SAGE tags (8), which mainly identify single loci on the ge...