BackgroundMany genetic diseases are caused by mutations in non-coding regions of the genome. These mutations are frequently found in enhancer sequences, causing disruption to the regulatory program of the cell. Enhancers are short regulatory sequences in the non-coding part of the genome that are essential for the proper regulation of transcription. While the experimental methods for identification of such sequences are improving every year, our understanding of the rules behind the enhancer activity has not progressed much in the last decade. This is especially true in case of tissue-specific enhancers, where there are clear problems in predicting specificity of enhancer activity.ResultsWe show a random-forest based machine learning approach capable of matching the performance of the current state-of-the-art methods for enhancer prediction. Then we show that it is, similarly to other published methods, frequently cross-predicting enhancers as active in different tissues, making it less useful for predicting tissue specific activity. Then we proceed to show that the problem is related to the fact that the enhancer predicting models exhibit a bias towards predicting gene promoters as active enhancers. Then we show that using a two-step classifier can lead to lower cross-prediction between tissues.ConclusionsWe provide whole-genome predictions of human heart and brain enhancers obtained with two-step classifier.Electronic supplementary materialThe online version of this article (doi:10.1186/s12920-017-0264-3) contains supplementary material, which is available to authorized users.
The existing methods designated for metatranscriptomic studies are still rare and being developed. In this paper we present a new analytical pipeline combining contig assembly, gene selection and functional annotation. This pipeline allowed us to reconstruct contigs with very high unique mappability (83%) and select sequences encoding putative bacterial genes reaching also a very high (66%), unique mappability of the NGS sequencing reads. Then, we have applied our pipeline to study faecal metatranscriptome of a Down syndrome (DS) mouse model, the Ts65Dn mice, in order to identify the differentially expressed transcripts. Recent studies have implicated dysbiosis of gut microbiota in several central nervous system (CNS) disorders, including DS. Given that DS individuals have an increased prevalence of obesity, we also studied the effects of a high-fat diet (HFD) on the transcriptomic changes of mice gut microbiomes, as the complex symbiotic relationship between the gut microbiome and its host is strongly influenced by diet and nutrition. Using our new pipeline we found that compared to wild type (WT), Ts65Dn mice showed an elevated expression levels of genes involved in hypoxanthine metabolism, which contributes to oxidative stress, and a down-regulated expression of genes involved in interactions with host epithelial cells and virulence. Microbiomes of mice fed HFD showed significantly higher expression levels of genes involved in membrane lipopolysaccharides / lipids biosynthesis, and decreased expression of osmoprotection and lysine fermentation genes, among others. We also found evidence that mice microbiota is capable of expressing genes encoding for neuromodulators, which may play a role in development of compulsive overeating and obesity. Our results show a DS-specific metatranscriptome profile and show that a high-fat diet affects the metabolism of mice gut microbiome by changing activity of genes involved in lipids, sugars, proteins and amino acids metabolism and cell membranes turnover. Our new analytical pipeline combining contig assembly, gene selection and functional annotation provides new insights into the metatranscriptomic studies.
Motivation: Functional annotation and enrichment analysis based on ontologies has become one of the standard methods of analysis of experimental results. Over the past decade, many methods have been proposed for statistical quantification of enrichment of different functional terms and many implementations of these methods are available. As the popularity of these methods grows, the need for tools facilitating their automation increases. Results: We present a complete Python library for statistical enrichment analysis of gene sets and gene rankings compatible with most available biological ontologies. It allows the user to perform all necessary steps: reading the ontologies and gene annotations in multiple formats; performing enrichment analysis using various methods and visualizing the results as readable reports. Importantly, our library includes methods for correcting for multiple hypotheses testing including computation of False Discovery Rates. Availability: The library is compatible with recent versions of python interpreter (≥ 2.6 or ≥ 3.3) and is available on github at: https://github.com/regulomics/biopython together with an API documentation and a tutorial. The sample galaxy installation can be found at
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.