Although the proteins that read the gene regulatory code, transcription factors (TFs), have been largely identified, it is not well known which sequences TFs can recognize. We have analyzed the sequence-specific binding of human TFs using high-throughput SELEX and ChIP sequencing. A total of 830 binding profiles were obtained, describing 239 distinctly different binding specificities. The models represent the majority of human TFs, approximately doubling the coverage compared to existing systematic studies. Our results reveal additional specificity determinants for a large number of factors for which a partial specificity was known, including a commonly observed A- or T-rich stretch that flanks the core motifs. Global analysis of the data revealed that homodimer orientation and spacing preferences, and base-stacking interactions, have a larger role in TF-DNA binding than previously appreciated. We further describe a binding model incorporating these features that is required to understand binding of TFs to DNA.
The majority of CpG dinucleotides in the human genome are methylated at cytosine bases. However, active gene regulatory elements are generally hypomethylated relative to their flanking regions, and the binding of some transcription factors (TFs) is diminished by methylation of their target sequences. By analysis of 542 human TFs with methylation-sensitive SELEX (systematic evolution of ligands by exponential enrichment), we found that there are also many TFs that prefer CpG-methylated sequences. Most of these are in the extended homeodomain family. Structural analysis showed that homeodomain specificity for methylcytosine depends on direct hydrophobic interactions with the methylcytosine 5-methyl group. This study provides a systematic examination of the effect of an epigenetic DNA modification on human TF binding specificity and reveals that many developmentally important proteins display preference for mCpG-containing sequences.
IntroductionCommon variable immunodeficiency (CVID) is the most common primary immunodeficiency in adults. 1 Recurrent bacterial infections of the respiratory tract are the clinical hallmark present in nearly all patients. 2 In addition, up to 40% of the patients show gastrointestinal disease, concomitant lymphoproliferative disorders, autoimmune phenomena, or granulomatous inflammation. 2 The pathogenic understanding of antibody deficiency in humans has always been hampered by the great heterogeneity of the syndrome. 3 In 1966, Rosen and Janeway started to group antibody deficiencies by their mode of inheritance. 4 In 1973, Cooper included the clinical course and serum immunoglobulin levels, thereby separating hyper-IgM syndromes and selective IgA deficiency. 5 The remaining group of still very heterogeneous antibody deficiencies was termed CVID. Consecutive attempts to subclassify CVID by B-cell function in vitro 6,7 failed to reach diagnostic acceptance because of laborious and poorly standardized procedures and a lack of clinical relevance.In 2002, we and others suggested a flow cytometric classification of CVID according to the B-cell phenotype. 8,9 The abnormalities of circulating B cells in patients with CVID had already been recognized earlier, 10 but only with the ease and the broad availability of flow cytometry was a widespread and systematic analysis of these aberrations possible. The Freiburg classification divided patients into 3 groups by analyzing the expression of IgM, IgD, CD27 and CD21. 8 Group 1 was characterized by a severe reduction of switched memory B cells (IgD Ϫ IgM Ϫ CD27 ϩ less than 0.4% of lymphocytes), while group 2 representing 25% of the analyzed CVID patients exhibited nearly normal numbers of class-switched memory B cells, suggesting a post germinal center defect. The online version of this article contains a data supplement.The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked ''advertisement'' in accordance with 18 USC section 1734. Methods PatientsAll patients were diagnosed as having CVID based on the European Society for Immunodeficiencies/Pan-American Group for Immunodeficiency (ESID/PAGID) criteria, 11 including a marked decrease of IgG (at least 2 standard deviations [SDs] below the mean for age) and a marked decrease in at least one of the isotypes IgM or IgA, the onset of clinical significant immunodeficiency at greater than 2 years of age, and the exclusion of defined causes of hypogammaglobulinemia (see also www.esid.org). Not all patients have been evaluated for absent isohemagglutinins and/or poor response to vaccines. For the final evaluation of B-cell phenotyping, the following exclusion criteria were adopted: patients younger than 6 years of age at the time of flowcytometric evaluation, patients on immunosuppressive treatment, patients suffering currently from malignancies, and patients with less than 1% peripheral B cells. Altogether, 303 patients of origi...
Counting individual RNA or DNA molecules is difficult because they are hard to copy quantitatively for detection. To overcome this limitation, we applied unique molecular identifiers (UMIs), which make each molecule in a population distinct, to genome-scale human karyotyping and mRNA sequencing in Drosophila melanogaster. Use of this method can improve accuracy of almost any next-generation sequencing method, including chromatin immunoprecipitation-sequencing, genome assembly, diagnostics and manufacturing-process control and monitoring.
Members of the large ETS family of transcription factors (TFs) have highly similar DNA-binding domains (DBDs)—yet they have diverse functions and activities in physiology and oncogenesis. Some differences in DNA-binding preferences within this family have been described, but they have not been analysed systematically, and their contributions to targeting remain largely uncharacterized. We report here the DNA-binding profiles for all human and mouse ETS factors, which we generated using two different methods: a high-throughput microwell-based TF DNA-binding specificity assay, and protein-binding microarrays (PBMs). Both approaches reveal that the ETS-binding profiles cluster into four distinct classes, and that all ETS factors linked to cancer, ERG, ETV1, ETV4 and FLI1, fall into just one of these classes. We identify amino-acid residues that are critical for the differences in specificity between all the classes, and confirm the specificities in vivo using chromatin immunoprecipitation followed by sequencing (ChIP-seq) for a member of each class. The results indicate that even relatively small differences in in vitro binding specificity of a TF contribute to site selectivity in vivo.
Homozygosity for the G allele of rs6983267 at 8q24 increases colorectal cancer (CRC) risk approximately 1.5 fold. We report here that the risk allele G shows copy number increase during CRC development. Our computer algorithm, Enhancer Element Locator (EEL), identified an enhancer element that contains rs6983267. The element drove expression of a reporter gene in a pattern that is consistent with regulation by the key CRC pathway Wnt. rs6983267 affects a binding site for the Wnt-regulated transcription factor TCF4, with the risk allele G showing stronger binding in vitro and in vivo. Genome-wide ChIP assay revealed the element as the strongest TCF4 binding site within 1 Mb of MYC. An unambiguous correlation between rs6983267 genotype and MYC expression was not detected, and additional work is required to scrutinize all possible targets of the enhancer. Our work provides evidence that the common CRC predisposition associated with 8q24 arises from enhanced responsiveness to Wnt signaling.
Gene expression is regulated by transcription factors (TFs), proteins that recognize short DNA sequence motifs. Such sequences are very common in the human genome, and an important determinant of the specificity of gene expression is the cooperative binding of multiple TFs to closely located motifs. However, interactions between DNA-bound TFs have not been systematically characterized. To identify TF pairs that bind cooperatively to DNA, and to characterize their spacing and orientation preferences, we have performed consecutive affinity-purification systematic evolution of ligands by exponential enrichment (CAP-SELEX) analysis of 9,400 TF-TF-DNA interactions. This analysis revealed 315 TF-TF interactions recognizing 618 heterodimeric motifs, most of which have not been previously described. The observed cooperativity occurred promiscuously between TFs from diverse structural families. Structural analysis of the TF pairs, including a novel crystal structure of MEIS1 and DLX3 bound to their identified recognition site, revealed that the interactions between the TFs were predominantly mediated by DNA. Most TF pair sites identified involved a large overlap between individual TF recognition motifs, and resulted in recognition of composite sites that were markedly different from the individual TF's motifs. Together, our results indicate that the DNA molecule commonly plays an active role in cooperative interactions that define the gene regulatory lexicon.
The genetic code-the binding specificity of all transfer-RNAs-defines how protein primary structure is determined by DNA sequence. DNA also dictates when and where proteins are expressed, and this information is encoded in a pattern of specific sequence motifs that are recognized by transcription factors. However, the DNA-binding specificity is only known for a small fraction of the~1400 human transcription factors (TFs). We describe here a high-throughput method for analyzing transcription factor binding specificity that is based on systematic evolution of ligands by exponential enrichment (SELEX) and massively parallel sequencing. The method is optimized for analysis of large numbers of TFs in parallel through the use of affinity-tagged proteins, barcoded selection oligonucleotides, and multiplexed sequencing. Data are analyzed by a new bioinformatic platform that uses the hundreds of thousands of sequencing reads obtained to control the quality of the experiments and to generate binding motifs for the TFs. The described technology allows higher throughput and identification of much longer binding profiles than current microarray-based methods. In addition, as our method is based on proteins expressed in mammalian cells, it can also be used to characterize DNA-binding preferences of full-length proteins or proteins requiring post-translational modifications. We validate the method by determining binding specificities of 14 different classes of TFs and by confirming the specificities for NFATC1 and RFX3 using ChIP-seq. Our results reveal unexpected dimeric modes of binding for several factors that were thought to preferentially bind DNA as monomers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.