We mapped the collection of The Institute of Physical and Chemical Research (Japan) (RIKEN) 21,076 full-length mouse cDNA clone sequences and the mouse RefSeq sequences to the recently completed draft of the mouse genome. Using this mapping, we identified 3674 mouse genes with multiple transcripts, of which 1098 have splice variants. All but 532 of 21,076 clones (97.5%) mapped to the genome assembly. Alignments of cDNA clone sequences with proteins show that much of the detected splice variation alters coding regions and affects the translated protein. We developed novel analytical techniques to classify observed splice variation and to assess the relation between splice variation and alternative transcription. This analysis indicates that an alternative choice of transcription start or polyadenylation signal frequently induces splice variation.High-quality, full-length cDNA sequences mapped to a highquality complete genome assembly are crucial for the comprehensive analysis of splice variation. Analysis of 21,076 fulllength mouse cDNA clone sequences (RIKEN Genome Exploration Group Phase II Team and FANTOM Consortium. 2001) and of mouse RefSeq sequences (Pruitt and Maglott 2001) mapped to the complete mouse genome assembly (ftp:// ftp.ensembl.org/pub/assembly/mouse/mgsc_assembly_3) reveals numerous and complex patterns of splice variation. We developed stringent computational filters to identify and classify splice variants while eliminating cloning, sequencing, and mapping errors. Our computational pipeline identified 3674 mouse genes with multiple transcripts, 1098 of which (30%) have splice variants (Fig. 1). A total of 971 (88%) of the genes with alternative transcripts closely matched GenBank proteins (Benson et al. 2000). The protein-to-DNA alignments indicate that most of the splice variation affects transcriptcoding regions. The type of variation observed in initial and terminal exons indicates that alternative use of transcription start site and polyadenylation signals may be frequently responsible for the choice of splice signals flanking these exons.The variant transcripts reveal many known and novel forms of proteins, including variants of the myosin light chain, phospholipase A2, and a potassium ion channel with alternative 5Ј protein sequences, as well as a uridine diphosphate (UDP)-galactose transporter-related protein, variants of osmosis-responsive factor with different 5Ј untranslated region (UTR) sequences, and a new form of seryl-tRNA synthase with an internal in-frame extra coding exon. These examples illustrate the breadth of protein function affected by splice variation. They also illustrate a class of variation well represented in our data set, alternative exons possibly associated with alternative start of transcription.Prior large-scale studies of splice variation have used expressed sequence tag (EST) data to focus on two important (2000) used an alternative approach for identifying novel gene forms; they compiled a dataset of well-curated spliceosomal introns and identified alternat...