Motivation: RNA binding proteins (RBPs) play important roles in post-transcriptional control of gene expression, including splicing, transport, polyadenylation and RNA stability. To model protein–RNA interactions by considering all available sources of information, it is necessary to integrate the rapidly growing RBP experimental data with the latest genome annotation, gene function, RNA sequence and structure. Such integration is possible by matrix factorization, where current approaches have an undesired tendency to identify only a small number of the strongest patterns with overlapping features. Because protein–RNA interactions are orchestrated by multiple factors, methods that identify discriminative patterns of varying strengths are needed.Results: We have developed an integrative orthogonality-regularized nonnegative matrix factorization (iONMF) to integrate multiple data sources and discover non-overlapping, class-specific RNA binding patterns of varying strengths. The orthogonality constraint halves the effective size of the factor model and outperforms other NMF models in predicting RBP interaction sites on RNA. We have integrated the largest data compendium to date, which includes 31 CLIP experiments on 19 RBPs involved in splicing (such as hnRNPs, U2AF2, ELAVL1, TDP-43 and FUS) and processing of 3’UTR (Ago, IGF2BP). We show that the integration of multiple data sources improves the predictive accuracy of retrieval of RNA binding sites. In our study the key predictive factors of protein–RNA interactions were the position of RNA structure and sequence motifs, RBP co-binding and gene region type. We report on a number of protein-specific patterns, many of which are consistent with experimentally determined properties of RBPs.Availability and implementation: The iONMF implementation and example datasets are available at https://github.com/mstrazar/ionmf.Contact: tomaz.curk@fri.uni-lj.siSupplementary information: Supplementary data are available at Bioinformatics online.
Point-based visualisations of large, multi-dimensional data from molecular biology can reveal meaningful clusters. One of the most popular techniques to construct such visualisations is t-distributed stochastic neighbor embedding (t-SNE), for which a number of extensions have recently been proposed to address issues of scalability and the quality of the resulting visualisations. We introduce openTSNE, a modular Python library that implements the core t-SNE algorithm and its extensions. The library is orders of magnitude faster than existing popular implementations, including those from scikit-learn. Unique to openTSNE is also the mapping of new data to existing embeddings, which can surprisingly assist in solving batch effects.Availability: openTSNE is available at https://github. com/pavlin-policar/openTSNE.
The yeast Saccharomyces cerevisiae is one of the oldest and most frequently used microorganisms in biotechnology with successful applications in the production of both bulk and fine chemicals. Yet, yeast researchers are faced with the challenge to further its transition from the old workhorse to a modern cell factory, fulfilling the requirements for next generation bioprocesses. Many of the principles and tools that are applied for this development originate from the field of synthetic biology and the engineered strains will indeed be synthetic organisms. We provide an overview of the most important aspects of this transition and highlight achievements in recent years as well as trends in which yeast currently lags behind. These aspects include: the enhancement of the substrate spectrum of yeast, with the focus on the efficient utilization of renewable feedstocks, the enhancement of the product spectrum through generation of independent circuits for the maintenance of redox balances and biosynthesis of common carbon building blocks, the requirement for accurate pathway control with improved genome editing and through orthogonal promoters, and improvement of the tolerance of yeast for specific stress conditions. The causative genetic elements for the required traits of the future yeast cell factories will be assembled into genetic modules for fast transfer between strains. These developments will benefit from progress in bio-computational methods, which allow for the integration of different kinds of data sets and algorithms, and from rapid advancement in genome editing, which will enable multiplexed targeted integration of whole heterologous pathways. The overall goal will be to provide a collection of modules and circuits that work independently and can be combined at will, depending on the individual conditions, and will result in an optimal synthetic host for a given production process.
Analysis of biomedical images requires computational expertize that are uncommon among biomedical scientists. Deep learning approaches for image analysis provide an opportunity to develop user-friendly tools for exploratory data analysis. Here, we use the visual programming toolbox Orange (http://orange.biolab.si) to simplify image analysis by integrating deep-learning embedding, machine learning procedures, and data visualization. Orange supports the construction of data analysis workflows by assembling components for data preprocessing, visualization, and modeling. We equipped Orange with components that use pre-trained deep convolutional networks to profile images with vectors of features. These vectors are used in image clustering and classification in a framework that enables mining of image sets for both novel and experienced users. We demonstrate the utility of the tool in image analysis of progenitor cells in mouse bone healing, identification of developmental competence in mouse oocytes, subcellular protein localization in yeast, and developmental morphology of social amoebae.
The human gut microbiota is increasingly recognized as an important factor in modulating innate and adaptive immunity through release of ligands and metabolites that translocate into circulation. Urbanizing African populations harbor large intestinal diversity due to a range of lifestyles, providing the necessary variation to gauge immunomodulatory factors. Here, we uncover a gradient of intestinal microbial compositions from rural through urban Tanzanian, towards European samples, manifested both in relative abundance and genomic variation observed in stool metagenomics. The rural population shows increased Bacteroidetes, led by Prevotella copri, but also presence of fungi. Measured ex vivo cytokine responses were significantly associated with 34 immunomodulatory microbes, which have a larger impact on circulating metabolites than non-significant microbes. Pathway effects on cytokines, notably TNF-α and IFN-γ, differential metabolome analysis and enzyme copy number enrichment converge on histidine and arginine metabolism as potential immunomodulatory pathways mediated by Bifidobacterium longum and Akkermansia muciniphila.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.