We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Significant fractions of eukaryotic genomes give rise to RNA, much of which is unannotated and has reduced protein-coding potential. The genomic origins and the associations of human nuclear and cytosolic polyadenylated RNAs longer than 200 nucleotides (nt) and whole-cell RNAs less than 200 nt were investigated in this genome-wide study. Subcellular addresses for nucleotides present in detected RNAs were assigned, and their potential processing into short RNAs was investigated. Taken together, these observations suggest a novel role for some unannotated RNAs as primary transcripts for the production of short RNAs. Three potentially functional classes of RNAs have been identified, two of which are syntenically conserved and correlate with the expression state of protein-coding genes. These data support a highly interleaved organization of the human transcriptome.
In Western Europe and the United States approximately 1 in 12 women develop breast cancer. A small proportion of breast cancer cases, in particular those arising at a young age, are attributable to a highly penetrant, autosomal dominant predisposition to the disease. The breast cancer susceptibility gene, BRCA2, was recently localized to chromosome 13q12-q13. Here we report the identification of a gene in which we have detected six different germline mutations in breast cancer families that are likely to be due to BRCA2. Each mutation causes serious disruption to the open reading frame of the transcriptional unit. The results indicate that this is the BRCA2 gene.
Sites of transcription of polyadenylated and nonpolyadenylated RNAs for 10 human chromosomes were mapped at 5-base pair resolution in eight cell lines. Unannotated, nonpolyadenylated transcripts comprise the major proportion of the transcriptional output of the human genome. Of all transcribed sequences, 19.4, 43.7, and 36.9% were observed to be polyadenylated, nonpolyadenylated, and bimorphic, respectively. Half of all transcribed sequences are found only in the nucleus and for the most part are unannotated. Overall, the transcribed portions of the human genome are predominantly composed of interlaced networks of both poly A+ and poly A- annotated transcripts and unannotated transcripts of unknown function. This organization has important implications for interpreting genotype-phenotype associations, regulation of gene expression, and the definition of a gene.
Using high-density oligonucleotide arrays representing essentially all nonrepetitive sequences on human chromosomes 21 and 22, we map the binding sites in vivo for three DNA binding transcription factors, Sp1, cMyc, and p53, in an unbiased manner. This mapping reveals an unexpectedly large number of transcription factor binding site (TFBS) regions, with a minimal estimate of 12,000 for Sp1, 25,000 for cMyc, and 1600 for p53 when extrapolated to the full genome. Only 22% of these TFBS regions are located at the 5' termini of protein-coding genes while 36% lie within or immediately 3' to well-characterized genes and are significantly correlated with noncoding RNAs. A significant number of these noncoding RNAs are regulated in response to retinoic acid, and overlapping pairs of protein-coding and noncoding RNAs are often coregulated. Thus, the human genome contains roughly comparable numbers of protein-coding and noncoding genes that are bound by common transcription factors and regulated by common environmental signals.
A first-generation fluctuating charge (FQ) force field to be ultimately applied for protein simulations is presented. The electrostatic model parameters, the atomic hardnesses, and electronegativities, are parameterized by fitting to DFT-based charge responses of small molecules perturbed by a dipolar probe mimicking a water dipole. The nonbonded parameters for atoms based on the CHARMM atom-typing scheme are determined via simultaneously optimizing vacuum water-solute geometries and energies (for a set of small organic molecules) and condensed phase properties (densities and vaporization enthalpies) for pure bulk liquids. Vacuum solute-water geometries, specifically hydrogen bond distances, are fit to 0.19 A r.m.s. error, while dimerization energies are fit to 0.98 kcal/mol r.m.s. error. Properties of the liquids studied include bulk liquid structure and polarization. The FQ model does indeed show a condensed phase effect in the shifting of molecular dipole moments to higher values relative to the gas phase. The FQ liquids also appear to be more strongly associated, in the case of hydrogen bonding liquids, due to the enhanced dipolar interactions as evidenced by shifts toward lower energies in pair energy distributions. We present results from a short simulation of NMA in bulk TIP4P-FQ water as a step towards simulating solvated peptide/protein systems. As expected, there is a nontrivial dipole moment enhancement of the NMA (although the quantitative accuracy is difficult to assess). Furthermore, the distribution of dipole moments of water molecules in the vicinity of the solutes is shifted towards larger values by 0.1-0.2 Debye in keeping with previously reported work.
A fluctuating charge (FQ) force field is applied to molecular dynamics simulations for six small proteins in explicit polarizable solvent represented by the TIP4P-FQ potential. The proteins include 1FSV, 1ENH, 1PGB, 1VII, 1H8K, and 1CRN, representing both helical and beta-sheet secondary structural elements. Constant pressure and temperature (NPT) molecular dynamics simulations are performed on time scales of several nanoseconds, the longest simulations yet reported using explicitly polarizable all-atom empirical potentials (for both solvent and protein) in the condensed phase. In terms of structure, the FQ force field allows deviations from native structure up to 2.5 A (with a range of 1.0 to 2.5 A). This is commensurate to the performance of the CHARMM22 nonpolarizable model and other currently existing polarizable models. Importantly, secondary structural elements maintain native structure in general to within 1 A (both helix and beta-strands), again in good agreement with the nonpolarizable case. In qualitative agreement with QM/MM ab initio dynamics on crambin (Liu et al. Proteins 2001, 44, 484), there is a sequence dependence of average condensed phase atomic charge for all proteins, a dependence one would anticipate considering the differing chemical environments around individual atoms; this is a subtle quantum mechanical feature captured in the FQ model but absent in current state-of-the-art nonpolarizable models. Furthermore, there is a mutual polarization of solvent and protein in the condensed phase. Solvent dipole moment distributions within the first and second solvation shells around the protein display a shift towards higher dipole moments (increases on the order of 0.2-0.3 Debye) relative to the bulk; protein polarization is manifested via the enhanced condensed phase charges of typical polar atoms such as backbone carbonyl oxygens, amide nitrogens, and amide hydrogens. Finally, to enlarge the sample set of proteins, gas-phase minimizations and 1 ps constant temperature simulations are performed on various-sized proteins to compare to earlier work by Kaminsky et al. (J Comp Chem 2002, 23, 1515). The present work establishes the feasibility of applying a fully polarizable force field for protein simulations and demonstrates the approach employed in extending the CHARMM force field to include these effects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.