Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal. Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body. We find that few genes are truly ‘housekeeping’, whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles. TSSs specific to different cell types evolve at different rates, whereas promoters of broadly expressed genes are the most conserved. Promoter-based expression analysis reveals key transcription factors defining cell states and links them to binding-site motifs. The functions of identified novel transcripts can be predicted by coexpression and sample ontology enrichment analyses. The functional annotation of the mammalian genome 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research.
Mammalian promoters can be separated into two classes, conserved TATA box-enriched promoters, which initiate at a well-defined site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis of the sequence architecture and evolution of distinct promoter classes. Different tissues and families of genes differentially use distinct types of promoters. Our tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identified new start sites associated with the majority of exons and with 3' UTRs. These data permit genome-scale identification of tissue-specific promoters and analysis of the cis-acting elements associated with them.
This study describes comprehensive polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome. We identify the 5' and 3' boundaries of 181,047 transcripts with extensive variation in transcripts arising from alternative promoter usage, splicing, and polyadenylation. There are 16,247 new mouse protein-coding transcripts, including 5154 encoding previously unidentified proteins. Genomic mapping of the transcriptome reveals transcriptional forests, with overlapping transcription on both strands, separated by deserts in which few transcripts are observed. The data provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.
A balanced (1;11)(q42.1;q14.3) translocation segregates with schizophrenia and related psychiatric disorders in a large Scottish family (maximum LOD = 6.0). We hypothesize that the translocation is the causative event and that it directly disrupts gene function. We previously reported a dearth of genes in the breakpoint region of chromosome 11 and it is therefore unlikely that the expression of any genes on this chromosome has been affected by the translocation. By contrast, the corresponding region on chromosome 1 is gene dense and, not one, but two novel genes are directly disrupted by the translocation. These genes have been provisionally named Disrupted-In-Schizophrenia 1 and 2 ( DISC1 and DISC2 ). DISC1 encodes a large protein with no significant sequence homology to other known proteins. It is predicted to consist of a globular N-terminal domain(s) and helical C-terminal domain which has the potential to form a coiled-coil by interaction with another, as yet, unidentified protein(s). Similar structures are thought to be present in a variety of unrelated proteins that are known to function in the nervous system. The putative structure of the protein encoded by DISC1 is therefore compatible with a role in the nervous system. DISC2 apparently specifies a non-coding RNA molecule that is antisense to DISC1, an arrangement that has been observed at other loci where it is thought that the antisense RNA is involved in regulating expression of the sense gene. Altogether, these observations indicate that DISC1 and DISC2 should be considered formal candidate genes for susceptibility to psychiatric illness.
Difficulties in fine-mapping quantitative trait loci (QTLs) are a major impediment to progress in the molecular dissection of complex traits in mice. Here we show that genome-wide high-resolution mapping of multiple phenotypes can be achieved using a stock of genetically heterogeneous mice. We developed a conservative and robust bootstrap analysis to map 843 QTLs with an average 95% confidence interval of 2.8 Mb. The QTLs contribute to variation in 97 traits, including models of human disease (asthma, type 2 diabetes mellitus, obesity and anxiety) as well as immunological, biochemical and hematological phenotypes. The genetic architecture of almost all phenotypes was complex, with many loci each contributing a small proportion to the total variance. Our data set, freely available at http://gscan.well.ox.ac.uk, provides an entry point to the functional characterization of genes involved in many complex traits.
SummaryThe presence of ribonucleotides in genomic DNA is undesirable given their increased susceptibility to hydrolysis. Ribonuclease (RNase) H enzymes that recognize and process such embedded ribonucleotides are present in all domains of life. However, in unicellular organisms such as budding yeast, they are not required for viability or even efficient cellular proliferation, while in humans, RNase H2 hypomorphic mutations cause the neuroinflammatory disorder Aicardi-Goutières syndrome. Here, we report that RNase H2 is an essential enzyme in mice, required for embryonic growth from gastrulation onward. RNase H2 null embryos accumulate large numbers of single (or di-) ribonucleotides embedded in their genomic DNA (>1,000,000 per cell), resulting in genome instability and a p53-dependent DNA-damage response. Our findings establish RNase H2 as a key mammalian genome surveillance enzyme required for ribonucleotide removal and demonstrate that ribonucleotides are the most commonly occurring endogenous nucleotide base lesion in replicating cells.
The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.
Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.