Transcription factors (TFs) regulate the expression of genes involved in myriad cellular processes through sequence-specific interactions with DNA. In order to predict DNA regulatory elements and the TFs targeting them with greater accuracy, detailed knowledge of the binding preferences of TFs is needed. Protein binding microarray (PBM) technology permits rapid, high-throughput characterization of the in vitro DNA binding specificities of proteins 1 . Here, we present a novel, maximally compact, synthetic DNA sequence design that represents all possible DNA sequence variants of a given length k (i.e., all "k-mers") on a single, universal microarray. We constructed such all k-mer microarrays covering all 10 base pair (bp) binding sites by converting high-density single-stranded oligonucleotide arrays to double-stranded DNA arrays. Using these microarrays, we comprehensively determined the binding specificities over a full range of affinities for five TFs of diverse structural classes from yeast, worm, mouse, and human. Importantly, the unbiased coverage of all k-mers permits an interrogation of binding site preferences, including nucleotide interdependencies, at unprecedented resolution.
Whole-genome mRNA quantitation can be used to identify the genes that are most responsive to environmental or genotypic change. By searching for mutually similar DNA elements among the upstream non-coding DNA sequences of these genes, we can identify candidate regulatory motifs and corresponding candidate sets of coregulated genes. We have tested this strategy by applying it to three extensively studied regulatory systems in the yeast Saccharomyces cerevisiae: galactose response, heat shock, and mating type. Galactose-response data yielded the known binding site of Gal4, and six of nine genes known to be induced by galactose. Heat shock data yielded the cell-cycle activation motif, which is known to mediate cell-cycle dependent activation, and a set of genes coding for all four nucleosomal proteins. Mating type alpha and a data yielded all of the four relevant DNA motifs and most of the known a- and alpha-specific genes.
The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCodeTM WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
Exposure to carcinogenic alkylating agents, oxidizing agents, and ionizing radiation modulates transcript levels for over one third of Saccharomyces cerevisiae's 6,200 genes. Computational analysis delineates groups of coregulated genes whose upstream regions bear known and novel regulatory sequence motifs. One group of coregulated genes contain a number of DNA excision repair genes (including the MAG1 3-methyladenine DNA glycosylase gene) and a large selection of protein degradation genes. Moreover, transcription of these genes is modulated by the proteasome-associated protein Rpn4, most likely via its binding to MAG1 upstream repressor sequence 2-like elements, that turn out to be almost identical to the recently identified proteasome-associated control element (G. Mannhaupt, R. Schnall, V. Karpov, I. Vetter, and H. Feldmann, FEBS Lett. 450:27-34, 1999). We have identified a large number of genes whose transcription is influenced by Rpn4p.Biological processes depend upon the structural integrity of the molecules that comprise living organisms. The structural integrity of the genome is particularly important because molecular alterations in the genetic material, usually DNA, can lead to permanent inheritable changes, i.e., mutations. However, the structural integrity of other cellular molecules, such as proteins, RNA, carbohydrates, and lipids, is also important, because the precise three-dimensional shape and the detailed chemistry of these molecules orchestrate the biochemical processes vital for life. Most biomolecules are inherently reactive, and as such their structural integrity is constantly challenged by reactive chemical and physical agents in the environment. It should therefore come as no surprise that all cells can sense and respond to unfavorable molecular alterations. Indeed, it is well known that cells sense and respond to damaged DNA and proteins, and such responses are exemplified by the SOS and heat shock responses that have been well characterized in Escherichia coli and other organisms (11,12,28).Here we explore the transcriptional response of Saccharomyces cerevisiae to a wide range of chemical and physical damaging agents. Specifically, we explore how transcript levels for every S. cerevisiae gene and open reading frame (ORF) respond when cellular molecules are damaged by a selection of environmentally and clinically relevant chemical and physical carcinogens. The global transcriptional response of this budding yeast to these damaging agents turns out to be far more extensive than anticipated. However, computational analysis of almost 200,000 data points reveals patterns in the data that allow us to define novel regulatory networks. We find that the responses of S. cerevisiae to each of six damaging agents are markedly different and that, for at least one agent, the response is dramatically affected by the cell's position in the cell cycle at the time of exposure. Computational clustering of the data and subsequent searching for common sequence motifs in promoter regions reveal nine such m...
Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the Personal Genome Project choose to forgo privacy via our institutional review board- approved “open consent” process. The contribution of public data and samples facilitates both scientific discovery and standardization of methods. We present our findings after enrollment of more than 1,800 participants, including whole-genome sequencing of 10 pilot participant genomes (the PGP-10). We introduce the Genome-Environment-Trait Evidence (GET-Evidence) system. This tool automatically processes genomes and prioritizes both published and novel variants for interpretation. In the process of reviewing the presumed healthy PGP-10 genomes, we find numerous literature references implying serious disease. Although it is sometimes impossible to rule out a late-onset effect, stringent evidence requirements can address the high rate of incidental findings. To that end we develop a peer production system for recording and organizing variant evaluations according to standard evidence guidelines, creating a public forum for reaching consensus on interpretation of clinically relevant variants. Genome analysis becomes a two-step process: using a prioritized list to record variant evaluations, then automatically sorting reviewed variants using these annotations. Genome data, health and trait information, participant samples, and variant interpretations are all shared in the public domain—we invite others to review our results using our participant samples and contribute to our interpretations. We offer our public resource and methods to further personalized medical research.
We describe a method of genome-wide analysis of quantitative growth phenotypes using insertional mutagenesis and DNA microarrays. We applied the method to assess the fitness contributions of Escherichia coli gene domains under specific growth conditions. A transposon library was subjected to competitive growth selection in Luria-Bertani (LB) and in glucose minimal media. Transposon-containing genomic DNA fragments from the selected libraries were compared with the initial unselected transposon insertion library on DNA microarrays to identify insertions that affect fitness. Genes involved in the biosynthesis of nutrients not provided in the growth medium were found to be significantly enriched in the set of genes containing negatively selected insertions. The data also identify fitness contributions of several uncharacterized genes, including putative transcriptional regulators and enzymes. The applicability of this high-resolution array selection in other species is discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.