We performed a genome-wide analysis of transcriptional start sites (TSSs) in human genes by multifaceted use of a massively parallel sequencer. By analyzing 800 million sequences that were obtained from various types of transcriptome analyses, we characterized 140 million TSS tags in 12 human cell types. Despite the large number of TSS clusters (TSCs), the number of TSCs was observed to decrease sharply with increasing expression levels. Highly expressed TSCs exhibited several characteristic features: Nucleosome-seq analysis revealed highly ordered nucleosome structures, ChIP-seq analysis detected clear RNA polymerase II binding signals in their surrounding regions, evaluations of previously sequenced and newly shotgunsequenced complete cDNA sequences showed that they encode preferable transcripts for protein translation, and RNA-seq analysis of polysome-incorporated RNAs yielded direct evidence that those transcripts are actually translated into proteins. We also demonstrate that integrative interpretation of transcriptome data is essential for the selection of putative alternative promoter TSCs, two of which also have protein consequences. Furthermore, discriminative chromatin features that separate TSCs at different expression levels were found for both genic TSCs and intergenic TSCs. The collected integrative information should provide a useful basis for future biological characterization of TSCs.
Ultraconservation, defined as perfect human-to-rodent sequence identity at least 200-bp long, is a strong indicator of evolutionary and functional importance and has been explored extensively at the genome level. However, it has not been investigated at the transcript level, where such extreme conservation might highlight loci with important post-transcriptional regulatory roles. We present 96 ultraconserved cDNA segments (UCSs), stretches of human mature mRNAs that match identically with orthologous regions in the mouse and rat genomes. UCSs can span multiple exons, a feature we leverage here to elucidate the role of ultraconservation in post-transcriptional regulation. UCS sites are implicated in functions at essentially every post-transcriptional stage: pre-mRNA splicing and degradation through alternative splicing and nonsense-mediated decay (AS-NMD), mature mRNA silencing by miRNA, fast mRNA decay rate and translational repression by upstream AUGs. We also found UCSs to exhibit resistance to formation of RNA secondary structure. These multiple layers of regulation underscore the importance of the UCS-containing genes as key global RNA processing regulators, including members of the serine/arginine-rich protein and heterogeneous nuclear ribonucleoprotein (hnRNP) families of essential splicing regulators. The discovery of UCSs shed new light on the multifaceted, fine-tuned and tight post-transcriptional regulation of gene families as conserved through the majority of the mammalian lineage.
On the basis of integrated transcriptome analysis, we show that not all transcriptional start site clusters (TSCs) in the intergenic regions (iTSCs) have the same properties; thus, it is possible to discriminate the iTSCs that are likely to have biological relevance from the other noise-level iTSCs. We used a total of 251 933 381 short-read sequence tags generated from various types of transcriptome analyses in order to characterize 6039 iTSCs, which have significant expression levels. We analyzed and found that 23% of these iTSCs were located in the proximal regions of the RefSeq genes. These RefSeq-linked iTSCs showed similar expression patterns with the neighboring RefSeq genes, had widely fluctuating transcription start sites and lacked ordered nucleosome positioning. These iTSCs seemed not to form independent transcriptional units, simply representing the by-products of the neighboring RefSeq genes, in spite of their significant expression levels. Similar features were also observed for the TSCs located in the antisense regions of the RefSeq genes. Furthermore, for the remaining iTSCs that were not associated with any RefSeq genes, we demonstrate that integrative interpretation of the transcriptome data provides essential information to specify their biological functions in the hypoxic responses of the cells.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.