In primates, the tandemly repeated genes encoding U2 small nuclear RNA evolve concertedly, i.e. the sequence of the U2 repeat unit is essentially homogeneous within each species but differs somewhat between species. Using chromosome painting and the NGFR gene as an outside marker, we show that the U2 tandem array (RNU2) has remained at the same chromosomal locus (equivalent to human 17q21) through multiple speciation events over > 35 million years leading to the Old World monkey and hominoid lineages. The data suggest that the U2 tandem repeat, once established in the primate lineage, contained sequence elements favoring perpetuation and concerted evolution of the array in situ, despite a pericentric inversion in chimpanzee, a reciprocal translocation in gorilla and a paracentric inversion in orang utan. Comparison of the 11 kb U2 repeat unit found in baboon and other Old World monkeys with the 6 kb U2 repeat unit in humans and other hominids revealed that an ancestral U2 repeat unit was expanded by insertion of a 5 kb retrovirus bearing 1 kb long terminal repeats (LTRs). Subsequent excision of the provirus by homologous recombination between the LTRs generated a 6 kb U2 repeat unit containing a solo LTR. Remarkably, both junctions between the human U2 tandem array and flanking chromosomal DNA at 17q21 fall within the solo LTR sequence, suggesting a role for the LTR in the origin or maintenance of the primate U2 array.
Cockayne syndrome (CS) is a devastating progeria most often caused by mutations in the CSB gene encoding a SWI/SNF family chromatin remodeling protein. Although all CSB mutations that cause CS are recessive, the complete absence of CSB protein does not cause CS. In addition, most CSB mutations are located beyond exon 5 and are thought to generate only C-terminally truncated protein fragments. We now show that a domesticated PiggyBac-like transposon PGBD3, residing within intron 5 of the CSB gene, functions as an alternative 3′ terminal exon. The alternatively spliced mRNA encodes a novel chimeric protein in which CSB exons 1–5 are joined in frame to the PiggyBac transposase. The resulting CSB-transposase fusion protein is as abundant as CSB protein itself in a variety of human cell lines, and continues to be expressed by primary CS cells in which functional CSB is lost due to mutations beyond exon 5. The CSB-transposase fusion protein has been highly conserved for at least 43 Myr since the divergence of humans and marmoset, and appears to be subject to selective pressure. The human genome contains over 600 nonautonomous PGBD3-related MER85 elements that were dispersed when the PGBD3 transposase was last active at least 37 Mya. Many of these MER85 elements are associated with genes which are involved in neuronal development, and are known to be regulated by CSB. We speculate that the CSB-transposase fusion protein has been conserved for host antitransposon defense, or to modulate gene regulation by MER85 elements, but may cause CS in the absence of functional CSB protein.
We have surveyed the tandemly repeated genes encoding U2 snRNA in a diverse panel of humans. We found only two polymorphisms within the U2 repeat unit: a SacI polymorphism (alleles SacI+ or SacI-) and a CT microsatellite polymorphism (alleles CT+ or CT-). Surprisingly, individual U2 tandem arrays are entirely SacI+ or SacI-, and entirely CT+ or CT-, although the SacI and CT alleles can occur in any combination. We also found that polymorphisms in the left and right junction regions flanking the tandem array fall into only two haplotypes (JL+ and JL-, JR+ and JR-). Most surprisingly, JL+ is always associated with JR+, and JL- with JR-. Thus individual U2 arrays do not exchange flanking markers, despite independent assortment and subsequent homogenization of the SacI and CT alleles within the U2 repeat units. We propose that the primary driving force for concerted evolution of the tandem U2 genes is intrachromosomal homogenization; interchromosomal genetic exchanges are much rarer, and reciprocal nonsister chromatid exchange apparently does not occur. Thus concerted evolution of the U2 tandem array occurs in situ along a chromosome lineage, and linkage disequilibrium between sequences flanking the U2 array may persist for long periods of time.
Cockayne syndrome is a segmental progeria most often caused by mutations in the CSB gene encoding a SWI/SNF-like ATPase required for transcription-coupled DNA repair (TCR). Over 43 Mya before marmosets diverged from humans, a piggyBac3 (PGBD3) transposable element integrated into intron 5 of the CSB gene. As a result, primate CSB genes now generate both CSB protein and a conserved CSB-PGBD3 fusion protein in which the first 5 exons of CSB are alternatively spliced to the PGBD3 transposase. Using a host cell reactivation assay, we show that the fusion protein inhibits TCR of oxidative damage but facilitates TCR of UV damage. We also show by microarray analysis that expression of the fusion protein alone in CSB-null UV-sensitive syndrome (UVSS) cells induces an interferon-like response that resembles both the innate antiviral response and the prolonged interferon response normally maintained by unphosphorylated STAT1 (U-STAT1); moreover, as might be expected based on conservation of the fusion protein, this potentially cytotoxic interferon-like response is largely reversed by coexpression of functional CSB protein. Interestingly, expression of CSB and the CSB-PGBD3 fusion protein together, but neither alone, upregulates the insulin growth factor binding protein IGFBP5 and downregulates IGFBP7, suggesting that the fusion protein may also confer a metabolic advantage, perhaps in the presence of DNA damage. Finally, we show that the fusion protein binds in vitro to members of a dispersed family of 900 internally deleted piggyBac elements known as MER85s, providing a potential mechanism by which the fusion protein could exert widespread effects on gene expression. Our data suggest that the CSB-PGBD3 fusion protein is important in both health and disease, and could play a role in Cockayne syndrome.
The CSB-PGBD3 fusion protein arose more than 43 million years ago when a 2.5-kb piggyBac 3 (PGBD3) transposon inserted into intron 5 of the Cockayne syndrome Group B (CSB) gene in the common ancestor of all higher primates. As a result, full-length CSB is now coexpressed with an abundant CSB-PGBD3 fusion protein by alternative splicing of CSB exons 1–5 to the PGBD3 transposase. An internal deletion of the piggyBac transposase ORF also gave rise to 889 dispersed, 140-bp MER85 elements that were mobilized in trans by PGBD3 transposase. The CSB-PGBD3 fusion protein binds MER85s in vitro and induces a strong interferon-like innate antiviral immune response when expressed in CSB-null UVSS1KO cells. To explore the connection between DNA binding and gene expression changes induced by CSB-PGBD3, we investigated the genome-wide DNA binding profile of the fusion protein. CSB-PGBD3 binds to 363 MER85 elements in vivo, but these sites do not correlate with gene expression changes induced by the fusion protein. Instead, CSB-PGBD3 is enriched at AP-1, TEAD1, and CTCF motifs, presumably through protein–protein interactions with the cognate transcription factors; moreover, recruitment of CSB-PGBD3 to AP-1 and TEAD1 motifs correlates with nearby genes regulated by CSB-PGBD3 expression in UVSS1KO cells and downregulated by CSB rescue of mutant CS1AN cells. Consistent with these data, the N-terminal CSB domain of the CSB-PGBD3 fusion protein interacts with the AP-1 transcription factor c-Jun and with RNA polymerase II, and a chimeric CSB-LacI construct containing only the N-terminus of CSB upregulates many of the genes induced by CSB-PGBD3. We conclude that the CSB-PGBD3 fusion protein substantially reshapes the transcriptome in CS patient CS1AN and that continued expression of the CSB-PGBD3 fusion protein in the absence of functional CSB may affect the clinical presentation of CS patients by directly altering the transcriptional program.
Infection of human cells with oncogenic adenovirus type 12 (Ad12) induces four specific chromosome fragile sites. Remarkably, three of these sites appear to colocalize with tandem arrays of genes encoding small, abundant, ubiquitously expressed structural RNAs-the RNU1 locus encoding U1 small nuclear RNA (snRNA), the RNU2 locus encoding U2 snRNA, and the RN5S locus encoding 5S rRNA. Recently, an artificial tandem array of the natural 5.8-kb U2 repeat unit has been shown to generate a new Ad12-inducible fragile site (Y.-P. Li, R. Tomanin, J. R. Smiley, and S. Bacchetti, Mol. Cell. Biol. 13:6064-6070, 1993), demonstrating that the U2 repeat unit alone is sufficient for virally induced fragility. To identify elements within the U2 repeat unit that are required for virally induced fragility, we generated cell lines containing artificial tandem arrays of the entire 5.8-kb repeat unit, an 834-bp fragment spanning the U2 gene alone, or the same 834-bp fragment from which key U2 transcriptional regulatory elements had been deleted. The U2 snRNA coding regions within each artificial array were marked by an innocuous single base change (U to C at position 87) so that the relative expression of supernumerary and endogenous U2 genes could be monitored by a primer extension assay. We find that artificial arrays of both the 5.8-and the 0.8-kb U2 repeat units are fragile but that arrays lacking either the distal sequence element or both the distal and the proximal sequence elements of the promoter are not. Surprisingly, variations in repeat copy number and/or transcriptional activity of the artificial arrays do not appear to correlate with the degree of Ad12-inducible fragility. We conclude that U2 transcriptional regulatory elements are required for virally induced fragility but not necessarily U2 snRNA transcription per se.
BackgroundpiggyBac domain (PGBD) transposons are found in organisms ranging from fungi to humans. Three domesticated piggyBac elements have been described. In the ciliates Paramecium tetraurelia and Tetrahymena thermophila, homologs known as piggyMacs excise internal eliminated sequences from germline micronuclear DNA during regeneration of the new somatic macronucleus. In primates, a PGBD3 element inserted into the Cockayne syndrome group B (CSB) gene over 43 Mya serves as an alternative 3′ terminal exon, enabling the CSB gene to generate both full length CSB and a conserved CSB-PGBD3 fusion protein that joins an N-terminal CSB domain to the C-terminal transposase domain.ResultsWe describe a fourth domesticated piggyBac element called PGBD5. We show that i) PGBD5 was first domesticated in the common ancestor of the cephalochordate Branchiostoma floridae (aka lancelet or amphioxus) and vertebrates, and is conserved in all vertebrates including lamprey but cannot be found in more basal urochordates, hemichordates, or echinoderms; ii) the lancelet, lamprey, and human PGBD5 genes are syntenic and orthologous; iii) no potentially mobile ancestral PGBD5 elements can be identified in other more deeply rooted organisms; iv) although derived from an IS4-related transposase of the RNase H clan, PGBD5 protein is unlikely to retain enzymatic activity because the catalytic DDD(D) motif is not conserved; v) PGBD5 is preferentially expressed in certain granule cell lineages of the brain and in the central nervous system based on available mouse and human in situ hybridization data, and the tissue-specificity of documented mammalian EST and mRNA clones; vi) the human PGBD5 promoter and gene region is rich in bound regulatory factors including the neuron-restrictive silencer factors NRSF/REST and CoREST, as well as SIN3, KAP1, STAT3, and CTCF; and vii) despite preferential localization within the nucleus, PGBD5 protein is unlikely to bind DNA or chromatin as neither DNase I digestion nor high salt extraction release PGBD5 from fractionated mouse brain nuclei.ConclusionsWe speculate that the neural-specific PGBD5 transposase was domesticated >500 My after cephalochordates and vertebrates split from urochordates, and that PGBD5 may have played a role in the evolution of a primitive deuterostome neural network into a centralized nervous system.
The wide prevalence and regulated expression of long noncoding RNAs (lncRNAs) highlight their functional roles, but the molecular basis for their activities and structure-function relationships remains to be investigated, with few exceptions. Among the relatively few lncRNAs conserved over significant evolutionary distances is the long intergenic noncoding RNA (lincRNA) Cyrano (orthologous to human OIP5-AS1), which contains a region of 300 highly conserved nucleotides within tetrapods, which in turn contains a functional stretch of 26 nt of deep conservation. This region binds to and facilitates the degradation of the microRNA miR-7, a short ncRNA with multiple cellular functions, including modulation of oncogenic expression. We probed the secondary structure of Cyrano in vitro and in cells using chemical and enzymatic probing, and validated the results using comparative sequence analysis. At the center of the functional core of Cyrano is a cloverleaf structure maintained over the >400 million years of divergent evolution that separates fish and primates. This strikingly conserved motif provides interaction sites for several RNA-binding proteins and masks a conserved recognition site for miR-7. Conservation in this region strongly suggests that the function of Cyrano depends on the formation of this RNA structure, which could modulate the rate and efficiency of degradation of miR-7.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.