20Introns are a prevalent feature of eukaryotic genomes, yet their origins and contributions to genome 21 function and evolution remain mysterious. In budding yeast, repression of the highly transcribed 22 intron-containing ribosomal protein genes (RPGs) globally increases splicing of non-RPG transcripts 23 through reduced competition for the spliceosome. We show that under these "hungry spliceosome" 24 conditions, splicing occurs at more than 150 previously unannotated locations we call protointrons 25 that do not overlap known introns. Protointrons use a less constrained set of splice sites and 26 branchpoints than standard introns, including in one case AT-AC in place of GT-AG. Protointrons 27 are not conserved in all closely related species, suggesting that most are not under selection. Some 28 are found in non-coding RNAs (e. g. CUTs and SUTs), where they may contribute to the creation of 29 new genes. Others are found across boundaries between noncoding and coding sequences, or within 30 coding sequences, where they offer pathways to the creation of new protein variants, or new 31 regulatory controls for existing genes. We define protointrons as (1) nonconserved intron-like 32 sequences that are (2) infrequently spliced, and importantly (3) are not currently understood to 33 contribute to gene expression or regulation in the way that standard introns function. A very few 34 protointrons in S. cerevisiae challenge this classification by their increased splicing frequency and 35 potential function, consistent with the proposed evolutionary process of "intronization", whereby 36 new standard introns are created. This snapshot of intron evolution highlights the important role of 37 the spliceosome in the expansion of transcribed genomic sequence space, providing a pathway for 38 the rare events that may lead to the birth of new eukaryotic genes and the refinement of existing gene 39 function. 40 41 3
Author Summary
42The protein coding information in eukaryotic genes is broken by intervening sequences called 43 introns that are removed from RNA during transcription by a large protein-RNA complex called the 44 spliceosome. Where introns come from and how the spliceosome contributes to genome evolution 45 are open questions. In this study, we find more than 150 new places in the yeast genome that are 46 recognized by the spliceosome and spliced out as introns. Since they appear to have arisen very 47 recently in evolution by sequence drift and do not appear to contribute to gene expression or its 48 regulation, we call these protointrons. Protointrons are found in both protein-coding and non-coding 49RNAs and are not efficiently removed by the splicing machinery. Although most protointrons are 50 not conserved, a few are spliced more efficiently, and are located where they might begin to play 51 functional roles in gene expression, as predicted by the proposed process of intronization. The 52 challenge now is to understand how spontaneously appearing splicing events like protointrons might 53 contribute to the cr...