Eukaryotic gene transcription is regulated by a large cohort of chromatin associated proteins, and inferring their differential binding sites between cellular contexts requires a rigorous comparison of the corresponding ChIP-seq data. We present MAnorm2, a new computational tool for quantitatively comparing groups of ChIP-seq samples. MAnorm2 uses a hierarchical strategy to normalize ChIP-seq data and then performs differential analysis by assessing within-group variability of ChIP-seq signals under an empirical Bayes framework. In this framework, MAnorm2 considers the abundance of differential ChIP-seq signals between groups of samples and the possibility of different within-group variability between groups. When samples in each group are biological replicates, MAnorm2 can reliably identify differential binding events even between highly similar.
Background: Gene transcription in eukaryotic cells is collectively controlled by a large panel of chromatin associated proteins and ChIP-seq is now widely used to locate their binding sites along the whole genome. Inferring the differential binding sites of these proteins between biological conditions by comparing the corresponding ChIP-seq samples is of general interest, yet it is still a computationally challenging task. Results: Here, we briefly review the computational tools developed in recent years for differential binding analysis with ChIP-seq data. The methods are extensively classified by their strategy of statistical modeling and scope of application. Finally, a decision tree is presented for choosing proper tools based on the specific dataset. Conclusions: Computational tools for differential binding analysis with ChIP-seq data vary significantly with respect to their applicability and performance. This review can serve as a practical guide for readers to select appropriate tools for their own datasets.
Background Lung adenocarcinoma (LUAD) is a highly malignant and heterogeneous tumor that involves various oncogenic genetic alterations. Epigenetic processes play important roles in lung cancer development. However, the variation in enhancer and super-enhancer landscapes of LUAD patients remains largely unknown. To provide an in-depth understanding of the epigenomic heterogeneity of LUAD, we investigate the H3K27ac histone modification profiles of tumors and adjacent normal lung tissues from 42 LUAD patients and explore the role of epigenetic alterations in LUAD progression. Results A high intertumoral epigenetic heterogeneity is observed across the LUAD H3K27ac profiles. We quantitatively model the intertumoral variability of H3K27ac levels at proximal gene promoters and distal enhancers and propose a new epigenetic classification of LUAD patients. Our classification defines two LUAD subgroups which are highly related to histological subtypes. Group II patients have significantly worse prognosis than group I, which is further confirmed in the public TCGA-LUAD cohort. Differential RNA-seq analysis between group I and group II groups reveals that those genes upregulated in group II group tend to promote cell proliferation and induce cell de-differentiation. We construct the gene co-expression networks and identify group-specific core regulators. Most of these core regulators are linked with group-specific regulatory elements, such as super-enhancers. We further show that CLU is regulated by 3 group I-specific core regulators and works as a novel tumor suppressor in LUAD. Conclusions Our study systematically characterizes the epigenetic alterations during LUAD progression and provides a new classification model that is helpful for predicting patient prognosis.
Recently, long non-coding RNAs (lncRNAs) have emerged as an important class of molecules involved in many cellular processes. One of their primary functions is to shape epigenetic landscape through interactions with chromatin modifying proteins. However, mechanisms contributing to the specificity of such interactions remain poorly understood. Here we took the human and mouse lncRNAs that were experimentally determined to have physical interactions with Polycomb repressive complex 2 (PRC2), and systematically investigated the sequence features of these lncRNAs by developing a new computational pipeline for sequences composition analysis, in which each sequence is considered as a series of transitions between adjacent nucleotides. Through that, PRC2-binding lncRNAs were found to be associated with a set of distinctive and evolutionarily conserved sequence features, which can be utilized to distinguish them from the others with considerable accuracy. We further identified fragments of PRC2-binding lncRNAs that are enriched with these sequence features, and found they show strong PRC2-binding signals and are more highly conserved across species than the other parts, implying their functional importance.
Isotope-labeling-based mass spectrometry (MS) is widely used in quantitative proteomic studies. With this technique, the relative abundance of thousands of proteins can be efficiently profiled in parallel, greatly facilitating the detection of proteins differentially expressed across samples. However, this task remains computationally challenging. Here we present a new approach, termed Model-based Analysis of Proteomic data (MAP), for this task. Unlike many existing methods, MAP does not require technical replicates to model technical and systematic errors, and instead utilizes a novel step-by-step regression analysis to directly assess the significance of observed protein abundance changes. We applied MAP to compare the proteomic profiles of undifferentiated and differentiated mouse embryonic stem cells (mESCs), and found it has superior performance compared with existing tools in detecting proteins differentially expressed during mESC differentiation. A web-based application of MAP is provided for online data processing at http://bioinfo.sibs.ac.cn/shaolab/MAP.
Eukaryotic gene transcription is regulated by a large cohort of chromatin associated proteins, and inferring their differential binding sites between cellular contexts requires a rigorous comparison of the corresponding ChIP-seq data. We present MAnorm2, a new computational tool for quantitatively comparing groups of ChIP-seq samples. MAnorm2 uses a hierarchical strategy for ChIP-seq data normalization and performs differential analysis by assessing within-group variability of ChIP-seq signals under an empirical Bayes framework. In this framework, MAnorm2 considers the abundance of differential ChIP-seq signals between groups of samples and the possibility of different within-group variability between groups. When samples in each group are biological replicates, MAnorm2 can reliably identify differential binding events even between highly similar
Identifying genomic regions with hypervariable ChIP-seq or ATAC-seq signals across given samples is essential for large-scale epigenetic studies. In particular, the hypervariable regions across tumors from different patients indicate their heterogeneity and can contribute to revealing potential cancer subtypes and the associated epigenetic markers. We present HyperChIP as the first complete statistical tool for the task. HyperChIP uses scaled variances that account for the mean-variance dependence to rank genomic regions, and it increases the statistical power by diminishing the influence of true hypervariable regions on model fitting. A pan-cancer case study illustrates the practical utility of HyperChIP.
With the reduction in sequencing costs, studies become prevalent that profile the chromatin landscape for tens or even hundreds of human individuals by using ChIP/ATAC-seq techniques. Identifying genomic regions with hypervariable ChIP/ATAC-seq signals across given samples is essential for such studies. In particular, the hypervariable regions (HVRs) across tumors from different patients indicate their heterogeneity and can contribute to revealing potential cancer subtypes and the associated epigenetic markers. We present HyperChIP as the first complete statistical tool for the task. HyperChIP uses scaled variances that account for the mean-variance dependence to rank genomic regions, and it increases the statistical power by diminishing the influence of true HVRs on model fitting. Applying it to a large pan-cancer ATAC-seq data set, we found that the identified HVRs not only provided a solid basis to uncover the underlying similarity structure among the involved tumor samples, but also led to the identification of transcription factors pertaining to the similarity structure when coupled with a motif-scanning analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.