Methods
Human and mouse transcript-confirmed exonsHuman and mouse transcript (mRNA and/or EST)-confirmed exons were extracted from the Alternative Splicing Database (ASD) (Release 2, April 2005) [1]. The AltSplice database in ASD is a computationally derived collection of alternative splicing (AS) events of human and mouse based on alignment of EST and mRNA sequences to the corresponding genomic sequences with high quality and minimal redundancy. In ASD, all transcript-genome alignments with ambiguities were removed. A confirmed intron is defined by an alignment gap of genomic sequence flanked by two splice sites of known types. A confirmed exon is defined by an alignment match flanked by two confirmed introns; therefore, only internal exons are considered as being confirmed. Confirmed introns and exons that overlap with each other indicate AS events. In human, AltSplice has 16 293 genes, including 9945 (61%) with one or more alternative splicing events. In mouse, AltSplice has 16 352 genes, including 8211 (50%) alternatively spliced ones. The higher percentage of alternatively spliced genes in human is probably due to the higher EST coverage.In this study, we considered only splicing events involving GT-AG intron boundaries. In total, 133 926 and 121 202 exons, plus 200 nucleotides of flanking intronic sequences, were extracted for human and mouse, respectively. Cassette exons are those included in some transcripts but skipped in others, without affecting the two neighboring exons (denoted as SCE, for simple cassette exons, in ASD). We extracted 10 196 and 5992 cassette exons for human and mouse, respectively. We also compiled a set of 30 892 and 37 313 exons that appear to be constitutively spliced in human and mouse, respectively. These exons were extracted from genes without AS events.A summary of frame-preserving preference and human-mouse conservation (see below) is given in Table S1 and Figure S1. We also compared other features, such as intron phase bias (data not shown). All these general statistics are similar to and consistent to those reported previously (e.g. [2][3][4]).
Exon inclusion/skipping levelFor each cassette exon, the number of supporting transcripts for the inclusion and the skipping isoforms was also extracted from ASD [1]. The number of supporting transcripts was used as an approximate measure of the abundance of the exon inclusion/skipping isoform, as done previously [5,6]. The ratio of the skipping to inclusion isoform or the ratio of the minor to major isoform (RMM) was used to estimate the relative abundance of the two isoforms.Previous studies (e.g. [5,7]) have shown that newly evolved splicing isoforms usually have low abundance, whereas original ancestral isoforms remain dominant to minimize the deleterious effects of new isoforms to the organism. During evolution, the new minor isoform becomes more abundant if it has adaptive benefits and is positively selected. Therefore, RMM represents an approximate measure of the evolutionary age and fitness of an AS event.
Frame-preserving p...