Genome-wide association studies (GWAS) have identified many noncoding variants associated with common diseases and traits. We show that these variants are concentrated in regulatory DNA marked by DNase I hypersensitive sites (DHSs). 88% of such DHSs are active during fetal development, and are enriched for gestational exposure-related phenotypes. We identify distant gene targets for hundreds of DHSs that may explain phenotype associations. Disease-associated variants systematically perturb transcription factor recognition sequences, frequently alter allelic chromatin states, and form regulatory networks. We also demonstrate tissue-selective enrichment of more weakly disease-associated variants within DHSs, and the de novo identification of pathogenic cell types for Crohn’s disease, multiple sclerosis, and an electrocardiogram trait, without prior knowledge of physiological mechanisms. Our results suggest pervasive involvement of regulatory DNA variation in common human disease, and provide pathogenic insights into diverse disorders.
DNaseI hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers, and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify ~2.9 million DHSs that encompass virtually all known experimentally-validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation, and regulatory factor occupancy patterns. We connect ~580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is choreographed with dozens to hundreds of co-activated elements, and the trans-cellular DNaseI sensitivity pattern at a given region can predict cell type-specific functional behaviors. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation.
SummaryAs the premier model organism in biomedical research, the laboratory mouse shares the majority of protein-coding genes with humans, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications, and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of other sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
We have developed technologies for creating saturating libraries of sequence-defined transposon insertion mutants in which each strain is maintained. Phenotypic analysis of such libraries should provide a virtually complete identification of nonessential genes required for any process for which a suitable screen can be devised. The approach was applied to Pseudomonas aeruginosa, an opportunistic pathogen with a 6.3-Mbp genome. The library that was generated consists of 30,100 sequence-defined mutants, corresponding to an average of five insertions per gene. About 12% of the predicted genes of this organism lacked insertions; many of these genes are likely to be essential for growth on rich media. Based on statistical analyses and bioinformatic comparison to known essential genes in E. coli, we estimate that the actual number of essential genes is 300 -400. Screening the collection for strains defective in two defined multigenic processes (twitching motility and prototrophic growth) identified mutants corresponding to nearly all genes expected from earlier studies. Thus, phenotypic analysis of the collection may produce essentially complete lists of genes required for diverse biological activities. The transposons used to generate the mutant collection have added features that should facilitate downstream studies of gene expression, protein localization, epistasis, and chromosome engineering.hole-genome sequences provide the foundation for the creation of relatively complete collections of strains carrying defined mutations in individual genes. Such libraries should facilitate the comprehensive identification of genes required for a wide range of biological processes. A nearly complete library of single-gene deletions of Saccharomyces cerevisiae has been assembled by an international consortium using a PCR-based mutagenesis approach (1). Other projects, also following a strategy of gene-by-gene disruption, are underway for Escherichia coli (E. coli genome project, www. genome.wisc.edu͞functional͞tnmutagenesis.htm), and have recently been completed for Bacillus subtilis (2).An alternative strategy for generating mutant libraries consists of ''random'' whole-genome transposon-insertion mutagenesis followed by sequence-based identification of insertion sites. The approach is cost-effective and applicable to a wide variety of microbes (3, 4). Studies with yeast, in which a collection of mutants corresponding to about one-third of the genes were represented, have illustrated that the generation of large, arrayed collections of insertion mutants is feasible (5). Other studies with bacteria have analyzed large numbers of transposon insertion mutants to identify genes essential for growth, although the mutants were analyzed within populations rather than being archived in a format allowing additional phenotypes to be examined (6)(7)(8). In this report, we describe the generation and initial phenotypic analysis of a near-saturation library of transposon insertion mutants of the opportunistic pathogen Pseudomonas aeruginos...
The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.
Inversions, deletions and insertions are important mediators of disease and disease susceptibility. We systematically compared the human genome reference sequence with a second genome (represented by fosmid paired-end sequences) to detect intermediate-sized structural variants >8 kb in length. We identified 297 sites of structural variation: 139 insertions, 102 deletions and 56 inversion breakpoints. Using combined literature, sequence and experimental analyses, we validated 112 of the structural variants, including several that are of biomedical relevance. These data provide a fine-scale structural variation map of the human genome and the requisite sequence precision for subsequent genetic studies of human disease.
Summary The human mitochondrial genome comprises a distinct genetic system transcribed as precursor polycistronic transcripts that are subsequently cleaved to generate individual mRNAs, tRNAs and rRNAs. Here we provide a comprehensive analysis of the human mitochondrial transcriptome across multiple cell lines and tissues. Using directional deep sequencing and parallel analysis of RNA ends, we demonstrate wide variation in mitochondrial transcript abundance and precisely resolve transcript processing and maturation events. We identify previously undescribed transcripts, including small RNAs, and observe the enrichment of several nuclear RNAs in mitochondria. Using high-throughput in vivo DNaseI footprinting, we establish the global profile of DNA-binding protein occupancy across the mitochondrial genome at single nucleotide resolution, revealing regulatory features at mitochondrial transcription initiation sites and functional insights into disease-associated variants. This integrated analysis of the mitochondrial transcriptome reveals unexpected complexity in the regulation, expression, and processing of mitochondrial RNA, and provides a resource for future studies of mitochondrial function (accessed at mitochondria.matticklab.com).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.