Piwi-interacting RNAs (piRNAs) are Ϸ30 nucleotide noncoding RNAs that may be involved in transposon silencing in mammalian germline cells. Most piRNA sequences are found in a small number of genomic regions referred to as clusters, which range from 1 to hundreds of kilobases. We studied the evolution of 140 rodent piRNA clusters, 103 of which do not overlap protein-coding genes. Phylogenetic analysis revealed that 14 clusters were acquired after rat-mouse divergence and another 44 after rodent-primate divergence. Most clusters originated in a process analogous to the duplication of protein-coding genes by ectopic recombination, via insertions of long sequences that were mediated by flanking chromosome-specific repetitive elements (REs). Source sequences for such insertions are often located on the same chromosomes and also harbor clusters. The rate of piRNA cluster expansion is higher than that of any known gene family and, in contrast to other large gene families, there was not a single cluster loss. These observations suggest that piRNA cluster expansion is driven by positive selection, perhaps caused by the need to silence the ever-expanding repertoire of mammalian transposons.arms race ͉ molecular evolution ͉ small RNA ͉ positive selection E ukaryotic genomes contain a variety of small noncoding RNAs, including microRNAs (miRNAs), repeat-associated small interfering RNAs (rasiRNAs), small interfering RNAs (siRNAs), and Piwi-interacting RNAs (piRNAs). miRNAs regulate the expression of protein-coding genes, rasiRNAs are involved in transposon silencing, and siRNAs play a dual role in silencing genes and transposons (1). Because of a number of similarities between rasiRNAs in Drosophila and piRNAs in mammals, rasiRNAs are considered to be a subclass of piRNAs. Thus, mammalian piRNAs are hypothesized to also be involved in transposon silencing, although they may perform other functions as well (2). Some noncoding RNAs, in particular miRNAs, evolve very slowly (3). In contrast, small-scale evolution of piRNA sequences proceeds at a rate typical of nonfunctional genomic regions (4). Here, we consider the large-scale evolution of mammalian piRNA clusters.
ResultsRecent Acquisition of Many piRNA Clusters. Thus far, 140 rat and mouse piRNA clusters have been described, each of which is most likely transcribed as a unit and subsequently processed into mature piRNAs (2, 4-8). We studied the evolution of each of these clusters within their genomic contexts (Table S1). For this purpose, we obtained regions that included 2 flanking protein-coding genes on either side of a cluster and constructed pairwise alignments of orthologous rat, mouse, human, dog, and cow regions. Thirty-seven clusters overlap protein-coding genes, often spanning several exons and introns. All of these clusters are ancestral, being present in rat, mouse, and human, which is not surprising because protein-coding genes are generally conserved. Among the remaining 103 clusters, each of which is contained within an intergenic region, only 43 are ancestr...