Cys2-His2 zinc finger (C2H2-ZF) proteins represent the largest class of putative human transcription factors. However, for most C2H2-ZF proteins it is unknown whether they even bind DNA or, if they do, to which sequences. Here, by combining data from a modified bacterial one-hybrid system with protein-binding microarray and chromatin immunoprecipitation analyses, we show that natural C2H2-ZFs encoded in the human genome bind DNA both in vitro and in vivo, and we infer the DNA recognition code using DNA-binding data for thousands of natural C2H2-ZF domains. In vivo binding data are generally consistent with our recognition code and indicate that C2H2-ZF proteins recognize more motifs than all other human transcription factors combined. We provide direct evidence that most KRAB-containing C2H2-ZF proteins bind specific endogenous retroelements (EREs), ranging from currently active to ancient families. The majority of C2H2-ZF proteins, including KRAB proteins, also show widespread binding to regulatory regions, indicating that the human genome contains an extensive and largely unstudied adaptive C2H2-ZF regulatory network that targets a diverse range of genes and pathways.
SignificanceUsing D-amino acids as the building blocks for bioactive peptides can dramatically increase their potency. However, simply swapping regular levorotary amino acids for dextrorotary (D)-amino acids alters the peptide surface topology and function is lost. Current methods to overcome this are not generally applicable and exclude the majority of therapeutic targets. By creating a mirror image of all 111,867 protein structures in the Protein Data Bank (PDB), we convert this repository into a D-peptide database with 2.8 million D-peptide structures. This D-PDB can be searched to find therapeutically active topologies, demonstrated here by the discovery of D-peptide GLP1R and PTH1R agonists. Evaluation of D-PDB coverage suggests that it holds candidates for most therapeutic targets and, thus, potentially contains hundreds of potent drug leads.
BackgroundThe C2H2 zinc finger (C2H2-ZF) is the most numerous protein domain in many metazoans, but is not as frequent or diverse in other eukaryotes. The biochemical and evolutionary mechanisms that underlie the diversity of this DNA-binding domain exclusively in metazoans are, however, mostly unknown.ResultsHere, we show that the C2H2-ZF expansion in metazoans is facilitated by contribution of non-base-contacting residues to DNA binding energy, allowing base-contacting specificity residues to mutate without catastrophic loss of DNA binding. In contrast, C2H2-ZF DNA binding in fungi, plants, and other lineages is constrained by reliance on base-contacting residues for DNA-binding functionality. Reconstructions indicate that virtually every DNA triplet was recognized by at least one C2H2-ZF domain in the common progenitor of placental mammals, but that extant C2H2-ZF domains typically bind different sequences from these ancestral domains, with changes facilitated by non-base-contacting residues.ConclusionsOur results suggest that the evolution of C2H2-ZFs in metazoans was expedited by the interaction of non-base-contacting residues with the DNA backbone. We term this phenomenon “kaleidoscopic evolution,” to reflect the diversity of both binding motifs and binding motif transitions and the facilitation of their diversification.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-017-1287-y) contains supplementary material, which is available to authorized users.
Development of an accurate protein–DNA recognition code that can predict DNA specificity from protein sequence is a central problem in biology. C2H2 zinc fingers constitute by far the largest family of DNA binding domains and their binding specificity has been studied intensively. However, despite decades of research, accurate prediction of DNA specificity remains elusive. A major obstacle is thought to be the inability of current methods to account for the influence of neighbouring domains. Here we show that this problem can be addressed using a structural approach: we build structural models for all C2H2-ZF–DNA complexes with known binding motifs and find six distinct binding modes. Each mode changes the orientation of specificity residues with respect to the DNA, thereby modulating base preference. Most importantly, the structural analysis shows that residues at the domain interface strongly and predictably influence the binding mode, and hence specificity. Accounting for predicted binding mode significantly improves prediction accuracy of predicted motifs. This new insight into the fundamental behaviour of C2H2-ZFs has implications for both improving the prediction of natural zinc finger-binding sites, and for prioritizing further experiments to complete the code. It also provides a new design feature for zinc finger engineering.
Most rare clinical missense variants cannot currently be classified as pathogenic or benign. Deficiency in human 5,10-methylenetetrahydrofolate reductase (MTHFR), the most common inherited disorder of folate metabolism, is caused primarily by rare missense variants. Further complicating variant interpretation, variant impacts often depend on environment. An important example of this phenomenon is the MTHFR variant p.Ala222Val (c.665C>T), which is carried by half of all humans and has a phenotypic impact that depends on dietary folate. Here we describe the results of 98,336 variant functional-impact assays, covering nearly all possible MTHFR amino acid substitutions in four folinate environments, each in the presence and absence of p.Ala222Val. The resulting atlas of MTHFR variant effects reveals many complex dependencies on both folinate and p.Ala222Val. MTHFR atlas scores can distinguish pathogenic from benign variants and, among individuals with severe MTHFR deficiency, correlate with age of disease onset. Providing a powerful tool for understanding structure-function relationships, the atlas suggests a role for a disordered loop in retaining cofactor at the active site and identifies variants that enable escape of inhibition by S-adenosylmethionine. Thus, a model based on eight MTHFR variant effect maps illustrates how shifting landscapes of environment-and genetic-background-dependent missense variation can inform our clinical, structural, and functional understanding of MTHFR deficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.