Transcription factor (TF) proteins recognize a small number of DNA sequences with high specificity and control the expression of neighbouring genes. The evolution of TF binding preference has been the subject of a number of recent studies, in which generalized binding profiles have been introduced and used to improve the prediction of new target sites. Generalized profiles are generated by aligning and merging the individual profiles of related TFs. However, the distance metrics and alignment algorithms used to compare the binding profiles have not yet been fully explored or optimized. As a result, binding profiles depend on TF structural information and sometimes may ignore important distinctions between subfamilies. Prediction of the identity or the structural class of a protein that binds to a given DNA pattern will enhance the analysis of microarray and ChIP-chip data where frequently multiple putative targets of usually unknown TFs are predicted. Various comparison metrics and alignment algorithms are evaluated (a total of 105 combinations). We find that local alignments are generally better than global alignments at detecting eukaryotic DNA motif similarities, especially when combined with the sum of squared distances or Pearson's correlation coefficient comparison metrics. In addition, multiple-alignment strategies for binding profiles and tree-building methods are tested for their efficiency in constructing generalized binding models. A new method for automatic determination of the optimal number of clusters is developed and applied in the construction of a new set of familial binding profiles which improves upon TF classification accuracy. A software tool, STAMP, is developed to host all tested methods and make them publicly available. This work provides a high quality reference set of familial binding profiles and the first comprehensive platform for analysis of DNA profiles. Detecting similarities between DNA motifs is a key step in the comparative study of transcriptional regulation, and the work presented here will form the basis for tool and method development for future transcriptional modeling studies.
We have established a collection of 2460 lethal or semi-lethal mutant lines using a procedure thought to insert single P elements into vital genes on the third chromosome of Drosophila melanogaster. More than 1200 randomly selected lines were examined by in situ hybridization and 90% found to contain single insertions at sites that mark 89% of all lettered subdivisions of the Bridges' map. A set of chromosomal deficiencies that collectively uncover ~25% of the euchromatin of chromosome 3 reveal lethal mutations in 468 lines corresponding to 145 complementation groups. We undertook a detailed analysis of the cytogenetic interval 86E-87F and identified 87 P-element-induced mutations falling into 38 complementation groups, 16 of which correspond to previously known genes. Twenty-one of these 38 complementation groups have at least one allele that has a P-element insertion at a position consistent with the cytogenetics of the locus. We have rescued P elements and flanking chromosomal sequences from the 86E-87F region in 35 lines with either lethal or genetically silent P insertions, and used these as probes to identify cosmids and P1 clones from the Drosophila genome projects. This has tied together the physical and genetic maps and has linked 44 previously identified cosmid contigs into seven “super-contigs” that span the interval. STS data for sequences flanking one side of the P-element insertions in 49 lines has identified insertions in the αγ element at 87C, two known transposable elements, and the open reading frames of seven putative single copy genes. These correspond to five known genes in this interval, and two genes identified by the homology of their predicted products to known proteins from other organisms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.