We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Human skin is a large, heterogeneous organ that protects the body from pathogens while sustaining microorganisms that influence human health and disease. Our analysis of 16S ribosomal RNA gene sequences obtained from 20 distinct skin sites of healthy humans revealed that physiologically comparable sites harbor similar bacterial communities. The complexity and stability of the microbial community are dependent on the specific characteristics of the skin site. This topographical and temporal survey provides a baseline for studies that examine the role of bacterial communities in disease states and the microbial interdependencies required to maintain healthy skin.The skin is a critical interface between the human body and its external environment, preventing loss of moisture and barring entry of pathogenic organisms (1). The skin is also an ecosystem, harboring microbial communities that live in a range of physiologically and topographically distinct niches (2). For example, hairy, moist underarms lie a short distance from smooth dry forearms, but these two niches are likely as ecologically dissimilar as rainforests are to deserts. Traditional culture-based characterizations of the skin microbiota are biased toward species that readily grow under standard laboratory conditions, such as Staphylococci spp. However, †To whom correspondence should be addressed. jsegre@nhgri.nih.gov. * See supporting online material for names of group members. Characterizing the microbiota that inhabit specific sites may provide insight into the delicate balance between skin health and disease. Certain dermatological disorders manifest at stereotypical skin sites [e.g., psoriasis on the outer elbow and atopic dermatitis (eczema) on the inner bend of the elbow]. Moreover, antibiotic exposure, modified hygienic practices, and lifestyle changes have the potential to alter the skin microbiome selectively and may underlie the increased incidence of human disorders such as atopic dermatitis. Understanding naturally occurring symbiotic microbial communities will provide insight into the conditions that favor the emergence of antibiotic-resistant organisms [e.g., the highly pathogenic strain of methicillin-resistant S. aureus, which acquired genes that promote growth on skin from the symbiont S. epidermidis (6)].We characterized the topographical and temporal diversity of the human skin microbiome with the use of 16S rRNA gene phylotyping, and generated 112,283 near-full-length bacterial 16S gene sequences from samples of 20 diverse skin sites on each of 10 healthy humans (7) (fig. S1 and table S1). Nineteen bacterial phyla were detected, but most sequences were assigned to four phyla: Actinobacteria (51.8%), Firmicutes (24.4%), Proteobacteria (16.5%), and Bacteroidetes (6.3%). Of the 205 identified genera represented by at least five sequences, three were associated with more than 62% of the sequences: Corynebacteria (22.8%; Actinobacteria), Propionibacteria (23.0%; Actinobacteria), and Staphylococci (16.8%; Firmicutes). At the species...
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion–base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
The many layers and structures of the skin serve as elaborate hosts to microbes, including a diversity of commensal and pathogenic bacteria that contribute to both human health and disease. To determine the complexity and identity of the microbes inhabiting the skin, we sequenced bacterial 16S small-subunit ribosomal RNA genes isolated from the inner elbow of five healthy human subjects. This analysis revealed 113 operational taxonomic units (OTUs; “phylotypes”) at the level of 97% similarity that belong to six bacterial divisions. To survey all depths of the skin, we sampled using three methods: swab, scrape, and punch biopsy. Proteobacteria dominated the skin microbiota at all depths of sampling. Interpersonal variation is approximately equal to intrapersonal variation when considering bacterial community membership and structure. Finally, we report strong similarities in the complexity and identity of mouse and human skin microbiota. This study of healthy human skin microbiota will serve to direct future research addressing the role of skin microbiota in health and disease, and metagenomic projects addressing the complex physiological interactions between the skin and the microbes that inhabit this environment.
Sex chromosomes are primary determinants of sexual dimorphism in many organisms. These chromosomes are thought to arise via the divergence of an ancestral autosome pair and are almost certainly influenced by differing selection in males and females. Exploring how sex chromosomes differ from autosomes is highly amenable to genomic analysis. We examined global gene expression in Drosophila melanogaster and report a dramatic underrepresentation of X-chromosome genes showing high relative expression in males. Using comparative genomics, we find that these same Xchromosome genes are exceptionally poorly conserved in the mosquito Anopheles gambiae. These data indicate that the X chromosome is a disfavored location for genes selectively expressed in males.
After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist 1,2. Here we present a human genome assembly that surpasses the continuity of GRCh38 2 , along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome 3 , we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes. Complete, telomere-to-telomere reference genome assemblies are necessary to ensure that all genomic variants are discovered and studied. At present, unresolved areas of the human genome are defined by multi-megabase satellite arrays in the pericentromeric regions and the ribosomal DNA arrays on acrocentric short arms, as well as regions enriched in segmental duplications that are greater than hundreds of kilobases in length and that exhibit sequence identity of more than 98% between paralogues. Owing to their absence from the reference, these repeat-rich sequences are often excluded from genetics and genomics studies, which limits the scope of association and functional analyses 4,5. Unresolved repeat sequences also result in unintended consequences; for example, paralogous sequence variants incorrectly being called as allelic variants 6 , and the contamination of bacterial gene databases 7. Completion of the entire human genome is expected to contribute to our understanding of chromosome function 8 , human disease 9 and genomic variation, which will improve technologies in biomedicine that use short-read mapping to a reference genome (for example, RNA sequencing (RNA-seq) 10 , chromatin immunoprecipitation followed by sequencing (ChIP-seq) 11 and assay for transposase-accessible chromatin using sequencing (ATAC-seq) 12). The fundamental challenge of reconstructing a genome from many comparatively short sequencing reads-a process known as genome assembly-is distinguishing the repeated sequences from one another 13. Resolving such repeats relies on sequencing reads that are long enough to span the entire repeat or accurate enough to distinguish each repeat copy on the basis of...
The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.
The systematic comparison of genomic sequences from different organisms represents a central focus of contemporary genome analysis. Comparative analyses of vertebrate sequences can identify coding and conserved non-coding regions, including regulatory elements, and provide insight into the forces that have rendered modern-day genomes. As a complement to whole-genome sequencing efforts, we are sequencing and comparing targeted genomic regions in multiple, evolutionarily diverse vertebrates. Here we report the generation and analysis of over 12 megabases (Mb) of sequence from 12 species, all derived from the genomic region orthologous to a segment of about 1.8 Mb on human chromosome 7 containing ten genes, including the gene mutated in cystic fibrosis. These sequences show conservation reflecting both functional constraints and the neutral mutational events that shaped this genomic region. In particular, we identify substantial numbers of conserved non-coding segments beyond those previously identified experimentally, most of which are not detectable by pair-wise sequence comparisons alone. Analysis of transposable element insertions highlights the variation in genome dynamics among these species and confirms the placement of rodents as a sister group to the primates.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.