DNA methylation is an important epigenetic modification involved in many biological processes and diseases. Recent developments in whole genome bisulfite sequencing (WGBS) technology have enabled genome-wide measurements of DNA methylation at single base pair resolution. Many experiments have been conducted to compare DNA methylation profiles under different biological contexts, with the goal of identifying differentially methylated regions (DMRs). Due to the high cost of WGBS experiments, many studies are still conducted without biological replicates. Methods and tools available for analyzing such data are very limited.We develop a statistical method, DSS-single, for detecting DMRs from WGBS data without replicates. We characterize the count data using a rigorous model that accounts for the spatial correlation of methylation levels, sequence depth and biological variation. We demonstrate that using information from neighboring CG sites, biological variation can be estimated accurately even without replicates. DMR detection is then carried out via a Wald test procedure. Simulations demonstrate that DSS-single has greater sensitivity and accuracy than existing methods, and an analysis of H1 versus IMR90 cell lines suggests that it also yields the most biologically meaningful results. DSS-single is implemented in the Bioconductor package DSS.
Understanding the link between non-coding sequence variants, identified in genome-wide association studies, and the pathophysiology of complex diseases remains challenging due to a lack of annotations in non-coding regions. To overcome this, we developed DIVAN, a novel feature selection and ensemble learning framework, which identifies disease-specific risk variants by leveraging a comprehensive collection of genome-wide epigenomic profiles across cell types and factors, along with other static genomic features. DIVAN accurately and robustly recognizes non-coding disease-specific risk variants under multiple testing scenarios; among all the features, histone marks, especially those marks associated with repressed chromatin, are often more informative than others.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-016-1112-z) contains supplementary material, which is available to authorized users.
Epigenetic modifications such as cytosine methylation and histone modification are linked to the pathology of ischemic brain injury. Recent research has implicated 5-hydroxymethylcytosine (5hmC), a DNA base derived from 5-methylcytosine (5mC) via oxidation by ten-eleven translocation (Tet) enzymes, in DNA methylation-related plasticity. Here we show that 5hmC abundance was increased after ischemic injury, and Tet2 was responsible for this increase; furthermore, inhibiting Tet2 expression abolished the increase of 5hmC caused by ischemic injury. The decrease in 5hmC modifications from inhibiting Tet2 activity was accompanied by increased infarct volume after ischemic injury. Genome-wide profiling of 5hmC revealed differentially hydroxymethylated regions (DhMRs) associated with ischemic injury, and DhMRs were enriched among the genes involved in cell junction, neuronal morphogenesis and neurodevelopment. In particular, we found that 5hmC modifications at the promoter region of brain-derived neurotrophic factor (BDNF) increased, which was accompanied by increased BDNF mRNA, whereas the inhibition of Tet2 reduced BDNF mRNA and protein expression. Finally, we show that the abundance of 5hmC in blood samples from patients with acute ischemic stroke was also significantly increased. Together, these data suggest that 5hmC modification could serve as both a potential biomarker and a therapeutic target for the treatment of ischemic stroke.
Environmental stress is among the most important contributors to increased susceptibility to develop psychiatric disorders, including anxiety and post-traumatic stress disorder. While even acute stress alters gene expression, the molecular mechanisms underlying these changes remain largely unknown. 5-hydroxymethylcytosine (5hmC) is a novel environmentally sensitive DNA modification that is highly enriched in post-mitotic neurons and is associated with active transcription of neuronal genes. Recently, we found a hippocampal increase of 5hmC in the glucocorticoid receptor gene (Nr3c1) following acute stress, warranting a deeper investigation of stress-related 5hmC levels. Here, we used an established chemical labeling and affinity purification method coupled with high-throughput sequencing technology to generate the first genome-wide profile of hippocampal 5hmC following exposure to acute restraint stress and a one-hour recovery. This approach found a genome-wide disruption in 5hmC associated with acute stress response, primarily in genic regions, and identified known and potentially novel stress-related targets that have a significant enrichment for neuronal ontological functions. Integration of these data with hippocampal gene expression data from these same mice found stress-related hydroxymethylation correlated to altered transcript levels and sequence motif predictions indicated that 5hmC may function by mediating transcription factor binding to these transcripts. Together, these data reveal an environmental impact on this newly discovered epigenetic mark in the brain and represent a critical step toward understanding stress-related epigenetic mechanisms that alter gene expression and can lead to the development of psychiatric disorders.
Summary
Single-cell assay of transposase-accessible chromatin followed by sequencing (scATAC-seq) is an emerging new technology for the study of gene regulation with single-cell resolution. The data from scATAC-seq are unique—sparse, binary and highly variable even within the same cell type. As such, neither methods developed for bulk ATAC-seq nor single-cell RNA-seq data are appropriate. Here, we present Destin, a bioinformatic and statistical framework for comprehensive scATAC-seq data analysis. Destin performs cell-type clustering via weighted principle component analysis, weighting accessible chromatin regions by existing genomic annotations and publicly available regulomic datasets. The weights and additional tuning parameters are determined via model-based likelihood. We evaluated the performance of Destin using downsampled bulk ATAC-seq data of purified samples and scATAC-seq data from seven diverse experiments. Compared to existing methods, Destin was shown to outperform across all datasets and platforms. For demonstration, we further applied Destin to 2088 adult mouse forebrain cells and identified cell-type-specific association of previously reported schizophrenia GWAS loci.
Availability and implementation
Destin toolkit is freely available as an R package at https://github.com/urrutiag/destin.
Supplementary information
Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.