We present a major update of the HOCOMOCO collection that consists of patterns describing DNA binding specificities for human and mouse transcription factors. In this release, we profited from a nearly doubled volume of published in vivo experiments on transcription factor (TF) binding to expand the repertoire of binding models, replace low-quality models previously based on in vitro data only and cover more than a hundred TFs with previously unknown binding specificities. This was achieved by systematic motif discovery from more than five thousand ChIP-Seq experiments uniformly processed within the BioUML framework with several ChIP-Seq peak calling tools and aggregated in the GTRD database. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices, which describe primary binding preferences of each transcription factor and reliable alternative binding specificities. An interactive interface and bulk downloads are available on the web: http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco11. In this release, we complement HOCOMOCO by MoLoTool (Motif Location Toolbox, http://molotool.autosome.ru) that applies HOCOMOCO models for visualization of binding sites in short DNA sequences.
BackgroundAdaptive immune responses to newly encountered pathogens depend on the mobilization of antigen-specific clonotypes from a vastly diverse pool of naive T cells. Using recent advances in immune repertoire sequencing technologies, models of the immune receptor rearrangement process, and a database of annotated T cell receptor (TCR) sequences with known specificities, we explored the baseline frequencies of T cells specific for defined human leukocyte antigen (HLA) class I-restricted epitopes in healthy individuals.MethodsWe used a database of TCR sequences with known antigen specificities and a probabilistic TCR rearrangement model to estimate the baseline frequencies of TCRs specific to distinct antigens epitopespecificT-cells. We verified our estimates using a publicly available collection of TCR repertoires from healthy individuals. We also interrogated a database of immunogenic and non-immunogenic peptides is used to link baseline T-cell frequencies with epitope immunogenicity.ResultsOur findings revealed a high degree of variability in the prevalence of T cells specific for different antigens that could be explained by the physicochemical properties of the corresponding HLA class I-bound peptides. The occurrence of certain rearrangements was influenced by ancestry and HLA class I restriction, and umbilical cord blood samples contained higher frequencies of common pathogen-specific TCRs. We also identified a quantitative link between specific T cell frequencies and the immunogenicity of cognate epitopes presented by defined HLA class I molecules.ConclusionsOur results suggest that the population frequencies of specific T cells are strikingly non-uniform across epitopes that are known to elicit immune responses. This inference leads to a new definition of epitope immunogenicity based on specific TCR frequencies, which can be estimated with a high degree of accuracy in silico, thereby providing a novel framework to integrate computational and experimental genomics with basic and translational research efforts in the field of T cell immunology.Electronic supplementary materialThe online version of this article (10.1186/s13073-018-0577-7) contains supplementary material, which is available to authorized users.
Recent proteogenomic studies revealed extensive translation outside of annotated protein coding regions, such as non-coding RNAs and untranslated regions of mRNAs. This non-canonical translation is largely due to start codon plurality within the same RNA. This plurality is often due to the failure of some scanning ribosomes to recognize potential start codons leading to initiation downstream—a process termed leaky scanning. Codons other than AUG (non-AUG) are particularly leaky due to their inefficiency. Here we discuss our current understanding of non-AUG initiation. We argue for a near-ubiquitous role of non-AUG initiation in shaping the dynamic composition of mammalian proteomes.
ObjectivesMammalian genomics studies, especially those focusing on transcriptional regulation, require information on genomic locations of regulatory regions, particularly, transcription factor (TF) binding sites. There are plenty of published ChIP-Seq data on in vivo binding of transcription factors in different cell types and conditions. However, handling of thousands of separate data sets is often impractical and it is desirable to have a single global map of genomic regions potentially bound by a particular TF in any of studied cell types and conditions.Data descriptionHere we report human and mouse cistromes, the maps of genomic regions that are routinely identified as TF binding sites, organized by TF. We provide cistromes for 349 mouse and 599 human TFs. Given a TF, its cistrome regions are supported by evidence from several ChIP-Seq experiments or several computational tools, and, as an optional filter, contain occurrences of sequence motifs recognized by the TF. Using the cistrome, we provide an annotation of TF binding sites in the vicinity of human and mouse transcription start sites. This information is useful for selecting potential gene targets of transcription factors and detecting co-regulated genes in differential gene expression data.
Highlights d CpG core of C/EBP binding sites is prone to mutations in adult stem cells and cancers d CEBPB has enhanced affinity for binding sites with CpG$TpG mismatches d A model of fixation of mutations in C/EBP sites through enhanced C/EBP binding
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.