Abstract. Transcription factor binding to the surface of DNA regulatory regions is one of the primary causes of regulating gene expression levels. A probabilistic approach to model protein-DNA interactions at the sequence level is through Position Weight Matrices (PWMs) that estimate the joint probability of a DNA binding site sequence by assuming positional independence within the DNA sequence.Here we construct conditional PWMs that depend on the motif signatures in the flanking DNA sequence, by conditioning known binding site loci on the presence or absence of additional binding sites in the flanking sequence of each site's locus. Pooling known sites with similar flanking sequence patterns allows for the estimation of the conditional distribution function over the binding site sequences.We apply our model to the Dorsal transcription factor binding sites active in patterning the Dorsal-Ventral axis of Drosophila development. We find that those binding sites that cooperate with nearby Twist sites on average contain about 0.5 bits of information about the presence of Twist transcription factor binding sites in the flanking sequence. We also find that Dorsal binding site detectors conditioned on flanking sequence information make better predictions about what is a Dorsal site relative to background DNA than detection without information about flanking sequence features.
Developmental gene regulation is characterized by complex networks of transcription factors and enhancers, but analysis of cis regulatory information on a genomic level by direct experimentation remains a challenge. Quantitative modeling offers an alternative path to develop a global understanding of the transcriptional regulatory code. Recent studies have focused on endogenous regulatory sequences; however distinct enhancers differ in many features, making it difficult to generalize to other cis‐regulatory elements. To isolate the effects of factor spacing, stoichiometry, and arrangement, we applied a systematic, quantitative approach to 27 simpler regulatory elements that were analyzed in the context of the Drosophila blastoderm embryo. Using a database of almost 1000 images, we present here the first quantitative analysis of short‐range transcriptional repressors, which play central roles in metazoan development. Our fractional occupancy‐ based modeling uncovered unexpected features of these proteins' activity that allow accurate predictions of regulation by the Giant, Knirps, Krüppel and Snail repressors, including modeling of endogenous enhancer sequences. This study provides essential elements of a transcriptional grammar that will allow extensive analysis of genomic information in Drosophila melanogaster and related organisms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.