Chromatin is important for the regulation of transcription and other functions, yet the diversity of chromatin composition and the distribution along chromosomes is still poorly characterized. By integrative analysis of genome-wide binding maps of 53 broadly selected chromatin components in Drosophila cells, we show that the genome is segmented into five principal chromatin types that are defined by unique, yet overlapping combinations of proteins, and form domains that can extend over >100 kb. We identify a repressive chromatin type that covers about half of the genome and lacks classic heterochromatin markers. Furthermore, transcriptionally active euchromatin consists of two types that differ in molecular organization and H3K36 methylation, and regulate distinct classes of genes. Finally, we provide evidence that the different chromatin types help to target DNA-binding factors to specific genomic regions. These results provide a global view of chromatin diversity and domain organization in a metazoan cell.
Summary
Members of transcription factor families typically have similar DNA binding specificities yet execute unique functions in vivo. Transcription factors often bind DNA as multiprotein complexes, raising the possibility that complex formation might modify their DNA binding specificities. To test this hypothesis, we developed an experimental and computational platform, SELEX-seq, that can be used to determine the relative affinities to any DNA sequence for any transcription factor complex. Applying this method to all eight Drosophila Hox proteins, we show that they obtain novel recognition properties when they bind DNA with the dimeric cofactor Extradenticle-Homothorax (Exd). Exd-Hox specificities group into three main classes that obey Hox gene collinearity rules and DNA structure predictions suggest that anterior and posterior Hox proteins prefer DNA sequences with distinct minor groove topographies. Together, these data suggest that emergent DNA recognition properties revealed by interactions with cofactors contribute to transcription factor specificities in vivo.
We present here a new computational method for discovering cis-regulatory elements that circumvents the need to cluster genes based on their expression profiles. Based on a model in which upstream motifs contribute additively to the log-expression level of a gene, this method requires a single genome-wide set of expression ratios and the upstream sequence for each gene, and outputs statistically significant motifs. Analysis of publicly available expression data for Saccharomyces cerevisiae reveals several new putative regulatory elements, some of which plausibly control the early, transient induction of genes during sporulation. Known motifs generally have high statistical significance.
We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each arm gene order has been extensively reshuffled, leading to a minimum of 921 syntenic blocks shared between the species. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 25-55 million years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences between the species-but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila.
Genomic analyses often involve scanning for potential transcription-factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a protein’s binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For 9 TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro–derived motifs performed similarly to motifs derived from in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices learned by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases (<10%). In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences.
SUMMARY
Reduced insulin/IGF-1-like signaling (IIS) extends C. elegans lifespan by upregulating stress response (Class I) and downregulating other (Class II) genes through a mechanism that depends on the conserved transcription factor DAF-16/FOXO. By integrating genomewide mRNA expression responsiveness to DAF-16 with genomewide in vivo binding data for a compendium of transcription factors, we discovered that PQM-1 is the elusive transcriptional activator that directly controls development (Class II) genes by binding to the DAF-16 associated element (DAE). DAF-16 directly regulates Class I genes only, through the DAF-16 binding element (DBE). Loss of PQM-1 suppresses daf-2 longevity and further slows development. Surprisingly, the nuclear localization of PQM-1 and DAF-16 is controlled by IIS in opposite ways, and was also found to be mutually antagonistic. We observe progressive loss of nuclear PQM-1 with age, explaining declining expression of PQM-1 targets. Together, our data suggest an elegant mechanism for balancing stress response and development.
The Myc/Max/Mad transcription factor network is critically involved in cell behavior; however, there is relatively little information on its genomic binding sites. We have employed the DamID method to carry out global genomic mapping of the Drosophila Myc, Max, and Mad/Mnt proteins. Each protein was tethered to Escherichia coli DNA adenine-methyltransferase (Dam) permitting methylation proximal to in vivo binding sites in Kc cells. Microarray analyses of methylated DNA fragments reveals binding to multiple loci on all major Drosophila chromosomes. This approach also reveals dynamic interactions among network members as we find that increased levels of dMax influence the extent of dMyc, but not dMnt, binding. Computer analysis using the REDUCE algorithm demonstrates that binding regions correlate with the presence of E-boxes, CG repeats, and other sequence motifs. The surprisingly large number of directly bound loci (∼ 15% of coding regions) suggests that the network interacts widely with the genome. Furthermore, we employ microarray expression analysis to demonstrate that hundreds of DamID-binding loci correspond to genes whose expression is directly regulated by dMyc in larvae. These results suggest that a fundamental aspect of Max network function involves widespread binding and regulation of gene expression.[Keywords: myc; mad; Drosophila; target genes; transcription] Supplemental material is available at http://parma.fhcrc.org/AOryan.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.