Structural Alignment of Protein–DNA Interfaces: Insights into the Determinants of Binding Specificity

Siggers, Trevor; Silkov, Antonina

doi:10.1016/j.jmb.2004.11.010

Cited by 64 publications

(64 citation statements)

References 62 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…If several structures had comparable homology scores, we chose either the most accurate one (using measures such as resolution of x-ray diffraction) or the one most relevant in the biological context (using information about cofactors and the dimerization state). Computing the score only from amino acids that contact DNA, rather than from entire aligned sequences, assumes that amino acid-DNA interactions are local: if the amino acids at the DNA-binding interface are conserved between two protein-DNA complexes, they will adopt similar geometric arrangements with respect to DNA, regardless of the rest of the protein (7,47,48). For example, a comparison of the engrailed and ␣2 homeodomain-DNA complexes revealed an extensive set of conserved contacts with DNA, even though the amino acid sequences were only 27% identical (7).…”

Section: Methodsmentioning

confidence: 99%

“…For example, a comparison of the engrailed and ␣2 homeodomain-DNA complexes revealed an extensive set of conserved contacts with DNA, even though the amino acid sequences were only 27% identical (7). A more recent study (48) identified a number of cases in which the local interface geometry was conserved, even if DNA conformational change was required in order to accommodate it.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Connecting protein structure with predictions of regulatory sites

Morozov

Siggia²

2007

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

A common task posed by microarray experiments is to infer the binding site preferences for a known transcription factor from a collection of genes that it regulates and to ascertain whether the factor acts alone or in a complex. The converse problem can also be posed: Given a collection of binding sites, can the regulatory factor or complex of factors be inferred? Both tasks are substantially facilitated by using relatively simple homology models for protein-DNA interactions, as well as the rapidly expanding protein structure database. For budding yeast, we are able to construct reliable structural models for 67 transcription factors and with them redetermine factor binding sites by using a Bayesian Gibbs sampling algorithm and an extensive protein localization data set. For 49 factors in common with a prior analysis of this data set (based largely on phylogenetic conservation), we find that half of the previously predicted binding motifs are in need of some revision. We also solve the inverse problem of ascertaining the factors from the binding sites by assigning a correct protein fold to 25 of the 49 cases from a previous study. Our approach is easily extended to other organisms, including higher eukaryotes. Our study highlights the utility of enlarging current structural genomics projects that exhaustively sample fold structure space to include all factors with significantly different DNA-binding specificities.protein-DNA interactions ͉ homology models of transcription factors ͉ weight matrix predictions T ranscription factors (TFs) are regulatory proteins used by the cell to activate or repress gene transcription. They interact with short nucleotide sequences, typically located upstream of a gene, by means of the DNA-binding domains that recognize their cognate binding sites. As a rule, regulation of gene transcription is analyzed by the bioinformatics methods designed to detect statistically overrepresented motifs in promoter sequences. Intergenic sequences bound by the TF can be identified by using DNA microarray technology, including chromatin immunoprecipitation (ChIPchip) (1, 2), protein binding (3), and DNA immunoprecipitation (DIP-chip) arrays (4). Of special note is a recent genome-wide study that used ChIP-chip analysis to profile in vivo genomic occupancies for 203 DNA-binding transcriptional regulators in Saccharomyces cerevisiae (2). Using these data, the authors predicted binding specificities for 65 TFs by using the genomes of related species; a number that was later increased to 98 by MacIsaac et al. (5) The DNA-binding domains of TFs can be classified into a limited number of structural families (6, 7). Structural studies of the protein-DNA complexes reveal that, within each family, the overall fold of the DNA-binding domain and its mode of interaction with the cognate binding site are remarkably conserved, resulting in a characteristic pattern of amino acid contacts with DNA bases. These interactions form the basis of the sequence-specific direct readout of nucleotide sequences by amino acid...

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Connecting protein structure with predictions of regulatory sites

Morozov

Siggia²

2007

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

show abstract

“…Alternatively, using the X-ray structure as a template, researchers can use direct molecular modeling of the TF-DNA interface to compute the change in binding free energy when the DNA sequence is mutated (26,61). Structure-based classification of protein-DNA interaction surfaces can also provide insight into the determinants of binding specificity (77).…”

Section: Using Structural Informationmentioning

confidence: 99%

Predictive Modeling of Genome-Wide mRNA Expression: From Modules to Molecules

Bussemaker

Foat²,

Ward³

2007

Annu. Rev. Biophys. Biomol. Struct.

View full text Add to dashboard Cite

Various algorithms are available for predicting mRNA expression and modeling gene regulatory processes. They differ in whether they rely on the existence of modules of coregulated genes or build a model that applies to all genes, whether they represent regulatory activities as hidden variables or as mRNA levels, and whether they implicitly or explicitly model the complex cis-regulatory logic of multiple interacting transcription factors binding the same DNA. The fact that functional genomics data of different types reflect the same molecular processes provides a natural strategy for integrative computational analysis. One promising avenue toward an accurate and comprehensive model of gene regulation combines biophysical modeling of the interactions among proteins, DNA, and RNA with the use of large-scale functional genomics data to estimate regulatory network connectivity and activity parameters. As the ability of these models to represent complex cis-regulatory logic increases, the need for approaches based on cross-species conservation may diminish.

show abstract

“…First, with some exceptions, homologous protein-DNA complexes tend to exhibit similar docking geometries, allowing for some conservation of contact patterns. 13 Second, within multispecific families, nonspecific contacts to the backbone are well conserved, while sequence positions making direct contacts to nucleotide bases show high variability. 12 Third, comparisons of protein and cognate DNA sequences within families, including mutual information and evolutionary trace analyses, have revealed protein-DNA sequence covariations and correlations in evolutionary importance that correspond to known or probable direct contacts.…”

Section: Introductionmentioning

confidence: 99%