A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional ∼12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.
The forkhead box (Fox) family of transcription factors, which originated in unicellular eukaryotes, has expanded over time through multiple duplication events, and sometimes through gene loss, to over 40 members in mammals. Fox genes have evolved to acquire a specialized function in many key biological processes. Mutations in Fox genes have a profound effect on human disease, causing phenotypes as varied as cancer, glaucoma and language disorders. We summarize the salient features of the evolution of the Fox gene family and highlight the diverse contribution of various Fox subfamilies to developmental processes, from organogenesis to speech acquisition.The forkhead box, or Fox, gene family of transcriptional regulators is an evolutionarily ancient gene family that is named after the Drosophila melanogaster gene fork head (fkh). Mutations in fkh cause defects in head fold involution during embryogenesis, resulting in a characteristic spiked head appearance in adult flies 1 . Hundreds of Fox genes have been identified in species ranging from yeasts to humans, and have been classified into subfamilies, such as FoxA and FoxP. Genetic analyses have shown that many of these genes have important biological functions in multiple species, from control of the cell cycle to differentiation of epithelia, and from placental development to formation of the inner ear. The evolutionary conservation of the crucial DNA-binding domain between orthologous members of the Fox gene family is remarkable; for example, there is 90% amino acid similarity between the D. melanogaster Fork head and the human FOXA1 protein. Several Fox genes are mutated in human disease, with phenotypes ranging from defective T cell differentiation to speech impediments 2,3 . Recent findings on the contribution of FOXA1-mediated gene regulation in breast and prostate cancer (see below) further show the large contribution of this gene family to human health 4,5 .In this Review, we first describe the evolution and phylogeny of this fascinating gene family. It is impossible to review in detail the contribution of all of the Fox genes to development and function, and for a short summary sketch on the entire gene family we refer the reader to two recent 'snapshots' (REFS 2,3 ). However, to indicate the breadth of function of the Fox gene family in more detail, we focus on three Fox classes -FoxO, FoxA and FoxP -because each of these classes shows a unique and important aspect of the diverse biology of this gene family. Over the past 20 years, genetic studies in organisms ranging from flies to humans have informed us of the essential biological functions of these representative Fox genes. Although in many cases the molecular details of how these factors select and control their targets, as well as the upstream regulation of these factors, remain unknown, their mutational analysis is nearing NIH Public Access
Genomes frequently evolve by reversals ρ( i,j ) that transform a gene order π 1 … π i π i +1 … π j -1 π j … π n into π 1 … π i π j -1 … π i +1 π j … π n . Reversal distance between permutations π and σis the minimum number of reversals to transform π into Α. Analysis of genome rearrangements in molecular biology started in the late 1930's, when Dobzhansky and Sturtevant published a milestone paper presenting a rearrangement scenario with 17 inversions between the species of Drosophilia . Analysis of genomes evolving by inversions leads to a combinatorial problem of sorting by reversals studied in detail recently. We study sorting of signed permutations by reversals, a problem that adequately models rearrangements in a small genomes like chloroplast or mitochondrial DNA. The previously suggested approximation algorithms for sorting signed permutations by reversals compute the reversal distance between permutations with an astonishing accuracy for both simulated and biological data. We prove a duality theorem explaining this intriguing performance and show that there exists a “hidden” parameter that allows one to compute the reversal distance between signed permutations in polynomial time.
Retroviruses differ in their preferences for sites for viral DNA integration in the chromosomes of infected cells. Human immunodeficiency virus (HIV) integrates preferentially within active transcription units, whereas murine leukemia virus (MLV) integrates preferentially near transcription start sites and CpG islands. We investigated the viral determinants of integration-site selection using HIV chimeras with MLV genes substituted for their HIV counterparts. We found that transferring the MLV integrase (IN) coding region into HIV (to make HIVmIN) caused the hybrid to integrate with a specificity close to that of MLV. Addition of MLV gag (to make HIVmGagmIN) further increased the similarity of target-site selection to that of MLV. A chimeric virus with MLV Gag only (HIVmGag) displayed targeting preferences different from that of both HIV and MLV, further implicating Gag proteins in targeting as well as IN. We also report a genome-wide analysis indicating that MLV, but not HIV, favors integration near DNase I–hypersensitive sites (i.e., +/− 1 kb), and that HIVmIN and HIVmGagmIN also favored integration near these features. These findings reveal that IN is the principal viral determinant of integration specificity; they also reveal a new role for Gag-derived proteins, and strengthen models for integration targeting based on tethering of viral IN proteins to host proteins.
Retroviral vectors are often used to introduce therapeutic sequences into patients' cells. In recent years, gene therapy with retroviral vectors has had impressive therapeutic successes, but has also resulted in three cases of leukaemia caused by insertional mutagenesis, which has focused attention on the molecular determinants of retroviral-integration target-site selection. Here, we review retroviral DNA integration, with emphasis on recent genome-wide studies of targeting and on the status of efforts to modulate target-site selection.
DNA sequences from retroviruses, retrotransposons, DNA transposons, and parvoviruses can all become integrated into the human genome. Accumulation of such sequences accounts for at least 40% of our genome today. These integrating elements are also of interest as gene-delivery vectors for human gene therapy. Here we present a comprehensive bioinformatic analysis of integration targeting by HIV, MLV, ASLV, SFV, L1, SB, and AAV. We used a mathematical method which allowed annotation of each base pair in the human genome for its likelihood of hosting an integration event by each type of element, taking advantage of more than 200 types of genomic annotation. This bioinformatic resource documents a wealth of new associations between genomic features and integration targeting. The study also revealed that the length of genomic intervals analyzed strongly affected the conclusions drawn—thus, answering the question “What genomic features affect integration?” requires carefully specifying the length scale of interest.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.