Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TFbinding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.[Supplemental material is available for this article.]Transcription factors (TFs) are critical for the transcriptional regulation of gene expression (Takahashi and Yamanaka 2006;Vaquerizas et al. 2009). In humans, they represent the largest family of proteins, accounting for around 10% of genes (Babu et al. 2004). There are two types of TFs: general and sequence-specific. The former TFs act cooperatively with RNA polymerase II and are ubiquitously involved in the transcription of a large fraction of genes (Lee and Young 2000). The latter TFs bind specific subsets of target genes, leading to distinct spatiotemporal patterns of gene expression (Kadonaga 2004). Although systematic gene expression quantification has been available for a decade from microarray experiments (Schena et al. 1995), only recently has the genome-wide identification of TF-binding sites become possible, owing to the development of chromatin immunoprecipitation followed by microarray (ChIP-chip) and sequencing (ChIP-seq) technologies (Ren et al. 2000;Johnson et al. 2007).In several previous studies, statistical models were constructed to study the regulatory functions of TF on gene expression based on the gene expression and TF-binding data (Ouyang et al. 2009;Cheng and Gerstein 2011). These studies showed that TFbinding signals around the transcription start sites (TSSs) of genes are predictive of gene expression levels with fairly high accuracy. But these studies have the following limitations: First, estimates of gene expression have relied on probes (microarray) or sequence reads (RNA-seq) spread across a gene, possibly across multiple unknown isoforms of that gene. It is often difficult to accurately determine the expression level of each transcript based on such data, which...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.