Although many long noncoding RNAs (lncRNAs) have been identified in muscle, their physiological function and regulatory mechanisms remain largely unexplored. In this study, we systematically characterized the expression profiles of lncRNAs during C2C12 myoblast differentiation and identified an intronic lncRNA, SYISL (SYNPO2 intron sense-overlapping lncRNA), that is highly expressed in muscle. Functionally, SYISL promotes myoblast proliferation and fusion but inhibits myogenic differentiation. SYISL knockout in mice results in significantly increased muscle fiber density and muscle mass. Mechanistically, SYISL recruits the enhancer of zeste homolog 2 (EZH2) protein, the core component of polycomb repressive complex 2 (PRC2), to the promoters of the cell-cycle inhibitor gene p21 and muscle-specific genes such as myogenin (MyoG), muscle creatine kinase (MCK), and myosin heavy chain 4 (Myh4), leading to H3K27 trimethylation and epigenetic silencing of target genes. Taken together, our results reveal that SYISL is a repressor of muscle development and plays a vital role in PRC2-mediated myogenesis.
Transcription factors (TFs) bind DNA by recognizing specific sequence motifs, typically of length 6–12bp. A motif can occur many thousands of times in the human genome, but only a subset of those sites are actually bound. Here we present a machine learning framework leveraging existing convolutional neural network architectures and model interpretation techniques to identify and interpret sequence context features most important for predicting whether a particular motif instance will be bound. We apply our framework to predict binding at motifs for 38 TFs in a lymphoblastoid cell line, score the importance of context sequences at base-pair resolution, and characterize context features most predictive of binding. We find that the choice of training data heavily influences classification accuracy and the relative importance of features such as open chromatin. Overall, our framework enables novel insights into features predictive of TF binding and is likely to inform future deep learning applications to interpret non-coding genetic variants.
Mechanisms by which noncoding genetic variation influences gene expression remain only partially understood but are considered to be major determinants of phenotypic diversity and disease risk. Here, we evaluated effects of >50 million single-nucleotide polymorphisms and short insertions/deletions provided by five inbred strains of mice on the responses of macrophages to interleukin-4 (IL-4), a cytokine that plays pleiotropic roles in immunity and tissue homeostasis. Of >600 genes induced >2-fold by IL-4 across the five strains, only 26 genes reached this threshold in all strains. By applying deep learning and motif mutation analyses to epigenetic data for macrophages from each strain, we identified the dominant combinations of lineage-determining and signal-dependent transcription factors driving IL-4 enhancer activation. These studies further revealed mechanisms by which noncoding genetic variation influences absolute levels of enhancer activity and their dynamic responses to IL-4, thereby contributing to strain-differential patterns of gene expression and phenotypic diversity.
Transcription factors (TFs) bind DNA by recognizing highly specific DNA sequence motifs, typically of length 6-12bp. A TF motif can occur tens of thousands of times in the human genome, but only a small fraction of those sites are actually bound. Despite the availability of genome-wide TF binding maps for hundreds of TFs, predicting whether a given motif occurrence is bound and identifying the influential context features remain challenging. Here we present a machine learning framework leveraging existing convolutional neural network architectures and state of the art model interpretation techniques to identify, visualize, and interpret context features most important for determining binding activity for a particular TF. We apply our framework to predict binding at motifs for 38 TFs in a lymphoblastoid cell line and achieve superior classification performance compared to existing frameworks. We compute importance scores for context regions at single base pair resolution and uncover known and novel determinants of TF binding. Finally, we demonstrate that important context bases are under increased purifying selection compared to nearby bases and are enriched in disease-associated variants identified by genome-wide association studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.