Machine learning methods have been widely applied to big data analysis in genomics and epigenomics research. Although accuracy and efficiency are common goals in many modeling tasks, model interpretability is especially important to these studies towards understanding the underlying molecular and cellular mechanisms. Deep neural networks (DNNs) have recently gained popularity in various types of genomic and epigenomic studies due to their capabilities in utilizing large-scale high-throughput bioinformatics data and achieving high accuracy in predictions and classifications. However, DNNs are often challenged by their potential to explain the predictions due to their black-box nature. In this review, we present current development in the model interpretation of DNNs, focusing on their applications in genomics and epigenomics. We first describe state-of-the-art DNN interpretation methods in representative machine learning fields. We then summarize the DNN interpretation methods in recent studies on genomics and epigenomics, focusing on current data- and computing-intensive topics such as sequence motif identification, genetic variations, gene expression, chromatin interactions and non-coding RNAs. We also present the biological discoveries that resulted from these interpretation methods. We finally discuss the advantages and limitations of current interpretation approaches in the context of genomic and epigenomic studies. Contact:xiaoman@mail.ucf.edu, haihu@cs.ucf.edu
Motivation MicroRNAs (miRNAs) are small noncoding RNAs that play important roles in gene regulation and phenotype development. The identification of miRNA transcription start sites (TSSs) is critical to understand the functional roles of miRNA genes and their transcriptional regulation. Unlike protein-coding genes, miRNA TSSs are not directly detectable from conventional RNA-Seq experiments due to miRNA-specific process of biogenesis. In the past decade, large-scale genome-wide TSS-Seq and transcription activation marker profiling data have become available, based on which, many computational methods have been developed. These methods have greatly advanced genome-wide miRNA TSS annotation. Results In this study, we summarized recent computational methods and their results on miRNA TSS annotation. We collected and performed a comparative analysis of miRNA TSS annotations from 14 representative studies. We further compiled a robust set of miRNA TSSs (RSmirT) that are supported by multiple studies. Integrative genomic and epigenomic data analysis on RSmirT revealed the genomic and epigenomic features of miRNA TSSs as well as their relations to protein-coding and long non-coding genes. Contact xiaoman@mail.ucf.edu, haihu@cs.ucf.edu
Motivation The identification of enhancer–promoter interactions (EPIs), especially condition-specific ones, is important for the study of gene transcriptional regulation. Existing experimental approaches for EPI identification are still expensive, and available computational methods either do not consider or have low performance in predicting condition-specific EPIs. Results We developed a novel computational method called EPIP to reliably predict EPIs, especially condition-specific ones. EPIP is capable of predicting interactions in samples with limited data as well as in samples with abundant data. Tested on more than eight cell lines, EPIP reliably identifies EPIs, with an average area under the receiver operating characteristic curve of 0.95 and an average area under the precision–recall curve of 0.73. Tested on condition-specific EPIPs, EPIP correctly identified 99.26% of them. Compared with two recently developed methods, EPIP outperforms them with a better accuracy. Availability and implementation The EPIP tool is freely available at http://www.cs.ucf.edu/˜xiaoman/EPIP/. Supplementary information Supplementary data are available at Bioinformatics online.
BackgroundIncreased lower body fat is associated with reduced cardiometabolic risk. The molecular basis for depot-specific differences in gluteofemoral (GF) compared with abdominal (A) subcutaneous adipocyte function is poorly understood. In the current report, we used a combination of Assay for Transposase-Accessible Chromatin followed by sequencing (ATAC-seq), RNA-seq, and chromatin immunoprecipitation (ChIP)-qPCR analyses that provide evidence that depot-specific gene expression patterns are associated with differential epigenetic chromatin signatures.MethodsPreadipocytes cultured from A and GF adipose tissue obtained from premenopausal apple-shaped women were used to perform transcriptome analysis by RNA-seq and assess accessible chromatin regions by ATAC-seq. We measured mRNA expression and performed ChIP-qPCR experiments for histone modifications of active (H3K4me3) and repressed chromatin (H3K27me3) regions respectively on the promoter regions of differentially expressed genes.ResultsRNA-seq experiments revealed an A-fat and GF-fat selective gene expression signature, with 126 genes upregulated in abdominal preadipocytes and 90 genes upregulated in GF cells. ATAC-seq identified almost 10-times more A-specific chromatin-accessible regions. Using a combined analysis of ATAC-seq and global gene expression data, we identified 74 of the 126 abdominal-specific genes (59%) with A-specific accessible chromatin sites within 200 kb of the transcription start site (TSS), including HOXA3, HOXA5, IL8, IL1b, and IL6. Interestingly, only 14 of the 90 GF-specific genes (15%) had GF-specific accessible chromatin sites within 200 kb of the corresponding TSS, including HOXC13 and HOTAIR, whereas 25 of them (28%) had abdominal-specific accessible chromatin sites. ChIP-qPCR experiments confirmed that the active H3K4me3 chromatin mark was significantly enriched at the promoter regions of HOXA5 and HOXA3 genes in abdominal preadipocytes, while H3K27me3 was less abundant relative to chromatin from GF. This is consistent with their A-fat specific gene expression pattern. Conversely, analysis of the promoter regions of the GF specific HOTAIR and HOXC13 genes exhibited high H3K4me3 and low H3K27me3 levels in GF chromatin compared to A chromatin.ConclusionsGlobal transcriptome and open chromatin analyses of depot-specific preadipocytes identified their gene expression signature and differential open chromatin profile. Interestingly, A-fat-specific open chromatin regions can be observed in the proximity of GF-fat genes, but not vice versa.Trial registrationClinicaltrials.gov, NCT01745471. Registered 5 December 2012.Electronic supplementary materialThe online version of this article (10.1186/s13148-018-0582-0) contains supplementary material, which is available to authorized users.
The computational identification of long non-coding RNAs (lncRNAs) is important to study lncRNAs and their functions. Despite the existence of many computation tools for lncRNA identification, to our knowledge, there is no systematic evaluation of these tools on common datasets and no consensus regarding their performance and the importance of the features used. To fill this gap, in this study, we assessed the performance of 17 tools on several common datasets. We also investigated the importance of the features used by the tools. We found that the deep learning-based tools have the best performance in terms of identifying lncRNAs, and the peptide features do not contribute much to the tool accuracy. Moreover, when the transcripts in a cell type were considered, the performance of all tools significantly dropped, and the deep learning-based tools were no longer as good as other tools. Our study will serve as an excellent starting point for selecting tools and features for lncRNA identification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.