Comparison of related genomes has emerged as a powerful lens for genome interpretation. Here, we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and report constrained elements covering ~4.2% of the genome. We use evolutionary signatures and comparison with experimental datasets to suggest candidate functions for ~60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events, and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements, and ~1,000 primate- and human-accelerated elements. Overlap with disease-associated variants suggests our findings will be relevant for studies of human biology and health.
Despite the conventional distinction between them, promoters and enhancers share many features in mammals, including divergent transcription and similar modes of transcription factor binding. Here, we examine the architecture of transcription initiation through comprehensive mapping of transcription start sites (TSSs) in human lymphoblastoid B-cell (GM12878) and chronic myelogenous leukemic (K562) tier 1, ENCODE cell lines. Using a nuclear run-on protocol called GRO-cap, which captures TSSs for both stable and unstable transcripts, we conduct detailed comparisons of thousands of promoters and enhancers in human cells. These analyses reveal a common architecture of initiation, including tightly spaced (110 bp) divergent initiation, similar frequencies of core-promoter sequence elements, highly positioned flanking nucleosomes, and two modes of transcription factor binding. Post-initiation transcript stability provides a more fundamental distinction between promoters and enhancers than patterns of histone modifications, transcription factors or co-activators. These results support a unified model of transcription initiation at promoters and enhancers.
“Orangutan” is derived from the Malay term “man of the forest” and aptly describes the Southeast Asian great apes native to Sumatra and Borneo. The orangutan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orangutan draft genome assembly and short read sequence data from five Sumatran and five Bornean orangutan genomes. Our analyses reveal that, compared to other primates, the orangutan genome has many unique features. Structural evolution of the orangutan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe the first primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orangutan genome structure. Orangutans have extremely low energy usage for a eutherian mammal1, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400k years ago (ya), is more recent than most previous studies and underscores the complexity of the orangutan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (Ne) expanded exponentially relative to the ancestral Ne after the split, while Bornean Ne declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.
Summary RNA polymerase II (Pol II) transcribes hundreds of kilobases of DNA, limiting the production of mRNAs and lncRNAs. We used Global Run-on Sequencing (GRO-seq) to measure the rates of transcription by Pol II following gene activation. Elongation rates vary as much as 4-fold at different genomic loci and in response to two distinct cellular signaling pathways [i.e., 17β-estradiol (E2) and TNFα]. The rates are slowest near the promoter and increase during the first ~15 kb transcribed. Gene body elongation rates correlate with Pol II density, resulting in systematically higher rates of transcript production at genes with higher Pol II density. Pol II dynamics following short inductions indicate that E2 stimulates gene expression by increasing Pol II initiation, whereas TNFα reduces Pol II residence time at pause sites. Collectively, our results identify previously uncharacterized variation in the rate of transcription and highlight elongation as an important, variable, and regulated rate-limiting step during transcription.
Transcriptional regulatory elements (TREs), including enhancers and promoters, determine the transcription levels of associated genes. We have recently shown that global run-on and sequencing (GRO-seq) with enrichment for 5'-capped RNAs reveals active TREs with high accuracy. Here, we demonstrate that active TREs can be identified by applying sensitive machine-learning methods to standard GRO-seq data. This approach allows TREs to be assayed together with gene expression levels and other transcriptional features in a single experiment. Our prediction method, called discriminative Regulatory Element detection from GRO-seq (dREG), summarizes GRO-seq read counts at multiple scales and uses support vector regression to identify active TREs. The predicted TREs are more strongly enriched for several marks of transcriptional activation, including eQTL, GWAS-associated SNPs, H3K27ac, and transcription factor binding than those identified by alternative functional assays. Using dREG, we survey TREs in eight human cell types and provide new insights into global patterns of TRE function.
Previous studies have shown that GAGA Factor (GAF) is enriched on promoters with paused RNA Polymerase II (Pol II), but its genome-wide function and mechanism of action remain largely uncharacterized. We assayed the levels of transcriptionally-engaged polymerase using global run-on sequencing (GRO-seq) in control and GAF-RNAi Drosophila S2 cells and found promoter-proximal polymerase was significantly reduced on a large subset of paused promoters where GAF occupancy was reduced by knock down. These promoters show a dramatic increase in nucleosome occupancy upon GAF depletion. These results, in conjunction with previous studies showing that GAF directly interacts with nucleosome remodelers, strongly support a model where GAF directs nucleosome displacement at the promoter and thereby allows the entry Pol II to the promoter and pause sites. This action of GAF on nucleosomes is at least partially independent of paused Pol II because intergenic GAF binding sites with little or no Pol II also show GAF-dependent nucleosome displacement. In addition, the insulator factor BEAF, the BEAF-interacting protein Chriz, and the transcription factor M1BP are strikingly enriched on those GAF-associated genes where pausing is unaffected by knock down, suggesting insulators or the alternative promoter-associated factor M1BP protect a subset of GAF-bound paused genes from GAF knock-down effects. Thus, GAF binding at promoters can lead to the local displacement of nucleosomes, but this activity can be restricted or compensated for when insulator protein or M1BP complexes also reside at GAF bound promoters.
The Drosophila Dscam1 gene encodes a vast number of cell recognition molecules through alternative splicing. These exhibit isoform-specific homophilic binding and regulate self-avoidance, the tendency of neurites from the same cell to repel one another. Genetic experiments indicate that different cells must express different isoforms. How this is achieved is not known, as the expression of alternative exons in vivo has not been shown. Here, we modified the endogenous Dscam1 locus to generate splicing reporters for all variants of exon 4. We demonstrate that splicing does not occur in a cell-type specific fashion, that cells identified by their unique locations express different exon 4 variants in different animals, and that splicing in identified neurons can change over time. Probabilistic expression is compatible with a widespread role in neural circuit assembly through self-avoidance and is incompatible with models in which specific isoforms of Dscam1 mediate recognition between processes of different cells.
How evolutionary changes at enhancers affect the transcription of target genes remains an important open question. Previous comparative studies of gene expression have largely measured the abundance of mRNA, which is affected by post-transcriptional regulatory processes, hence limiting inferences about the mechanisms underlying expression differences. Here we directly measured nascent transcription in primate species, allowing us to separate transcription from post-transcriptional regulation. We used PRO-seq to map RNA polymerases in resting and activated CD4+ T-cells in multiple human, chimpanzee, and rhesus macaque individuals, with rodents as outgroups. We observed general conservation in coding and non-coding transcription, punctuated by numerous differences between species, particularly at distal enhancers and non-coding RNAs. Genes regulated by larger numbers of enhancers are more frequently transcribed at evolutionarily stable levels, despite reduced conservation at individual enhancers. Adaptive nucleotide substitutions are associated with lineage-specific transcription, and at one locus, SGPP2, we predict and experimentally validate that multiple substitutions contribute to human-specific transcription. Collectively, our findings suggest a pervasive role for evolutionary compensation across ensembles of enhancers that jointly regulate target genes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.