Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological function of the expressed protein and even to create new protein functions. Alternative splicing has been suggested as one explanation for the discrepancy between the number of human genes and functional complexity. Here, we carry out a detailed study of the alternatively spliced gene products annotated in the ENCODE pilot project. We find that alternative splicing in human genes is more frequent than has commonly been suggested, and we demonstrate that many of the potential alternative gene products will have markedly different structure and function from their constitutively spliced counterparts. For the vast majority of these alternative isoforms, little evidence exists to suggest they have a role as functional proteins, and it seems unlikely that the spectrum of conventional enzymatic or structural functions can be substantially extended through alternative splicing.function ͉ human ͉ isoforms ͉ splice ͉ structure A lternative mRNA splicing, the generation of a diverse range of mature RNAs, has considerable potential to expand the cellular protein repertoire (1-3), and recent studies have estimated that 40-80% of multiexon human genes can produce differently spliced mRNAs (4, 5). The importance of alternative splicing in processes such as development (6) has long been recognized, and proteins coded by alternatively spliced transcripts have been implicated in a number of cellular pathways (7-9). The extent of alternative splicing in eukaryotic genomes has lead to suggestions that alternative splicing is key to understanding how human complexity can be encoded by so few genes (10).The pilot project of the Encyclopedia of DNA Elements (ENCODE) (11), which aims to identify all the functional elements in the human genome, has undertaken a comprehensive analysis of 44 selected regions that make up 1% of the human genome. One valuable element of the project has been the detailing of a reference set of manually annotated splice variants by the GENCODE consortium (12). The annotation by the GENCODE consortium is an extension of the manually curated annotation by the Havana team at The Sanger Institute.Although a full understanding of the functional implications of alternative splicing is still a long way off, the GENCODE set has provided us with the material to make an in-depth assessment of a systematically collected reference set of splice variants.
ResultsAlternative Splicing Frequency. The GENCODE set is made up of 2,608 annotated transcripts for 487 distinct loci. A total of 1,097 transcripts from 434 loci are predicted to be protein coding. There are on average 2.53 protein coding variants per locus; 182 loci have only one variant, whereas one locus, RP1-309K20.2 (CPNE1) has 17 coding variants.A total of 57.8% of the loci are annotated with alternatively spliced transcripts, although there are differences between target re...