Chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP-array) has become a popular procedure for studying genome-wide protein-DNA interactions and transcription regulation. However, it can only map the probable protein-DNA interaction loci within 1-2 kilobases resolution. To pinpoint interaction sites down to the base-pair level, we introduce a computational method, Motif Discovery scan (MDscan), that examines the ChIP-array-selected sequences and searches for DNA sequence motifs representing the protein-DNA interaction sites. MDscan combines the advantages of two widely adopted motif search strategies, word enumeration and position-specific weight matrix updating, and incorporates the ChIP-array ranking information to accelerate searches and enhance their success rates. MDscan correctly identified all the experimentally verified motifs from published ChIP-array experiments in yeast (STE12, GAL4, RAP1, SCB, MCB, MCM1, SFF, and SWI5), and predicted two motif patterns for the differential binding of Rap1 protein in telomere regions. In our studies, the method was faster and more accurate than several established motif-finding algorithms. MDscan can be used to find DNA motifs not only in ChIP-array experiments but also in other experiments in which a subgroup of the sequences can be inferred to contain relatively abundant motif sites. The MDscan web server can be accessed at http://BioProspector.stanford.edu/MDscan/.
We propose MOTIF REGRESSOR for discovering sequence motifs upstream of genes that undergo expression changes in a given condition. The method combines the advantages of matrix-based motif finding and oligomer motif-expression regression analysis, resulting in high sensitivity and specificity. MOTIF REGRESSOR is particularly effective in discovering expression-mediating motifs of medium to long width with multiple degenerate positions. When applied to Saccharomyces cerevisiae, MOTIF REGRESSOR identified the ROX1 and YAP1 motifs from Rox1p and Yap1p overexpression experiments, respectively; predicted that Gcn4p may have increased activity in YAP1 deletion mutants; reported a group of motifs (including GCN4, PHO4, MET4, STRE, USR1, RAP1, M3A, and M3B) that may mediate the transcriptional response to amino acid starvation; and found all of the known cell-cycle regulation motifs from 18 expression microarrays over two cell cycles.sequence motif discovery ͉ microarray data ͉ correlation ͉ transcription regulation D irect experimental determination of transcription factor DNA-binding motifs (TFBM) is not practical or efficient in many biological systems. Therefore, computational algorithms such as the word-enumeration (1-4), the position-specific matrix update (5-7), and the dictionary (8) methods have been developed to identify putative motifs and guide experimentation. One of the most successful computational tactics for TFBM discovery is to cluster genes based on their expression profiles, and then search for motifs in the sequences upstream of tightly clustered genes (9). When noise is introduced into the cluster through spurious correlations, however, such an approach may result in false positives. A filtering method (10) based on the specificity of the motif occurrences has been shown to effectively eliminate false positives, but the sensitivity of the algorithm is still low in some cases. An iterative procedure for simultaneous clustering and motif finding has been suggested (11), but no effective algorithm has been implemented to demonstrate its advantage in biological data. Two novel methods for TFBM discovery via the association of gene expression values with oligomer motif abundances have been proposed (12, 13). They first conduct word enumeration and then use regression to check whether the genes whose upstream sequences contain a set of words have significant changes in their expression. These methods are effective for discovering conserved short motifs and sometimes interactions among them, but are not effective with longer motifs and may lose sensitivity in cases where TFBMs have multiple degenerate positions. ʈ We present an alternative approach operating under the explicit assumption that, in response to a given biological condition, the effect of a TFBM is strongest among genes with the most dramatic increase or decrease in mRNA expression. We first use a fast and sensitive motif-finding method, MDSCAN (14), to generate a large set of motif candidates that are enriched in the DNA sequence upstream o...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.