R package 'nsgp' is available at www.ibisc.fr/en/logiciels_arobas.
Motivation: Piwi-interacting RNA (piRNA) is the most recently discovered and the least investigated class of Argonaute/Piwi protein-interacting small non-coding RNAs. The piRNAs are mostly known to be involved in protecting the genome from invasive transposable elements. But recent discoveries suggest their involvement in the pathophysiology of diseases, such as cancer. Their identification is therefore an important task, and computational methods are needed. However, the lack of conserved piRNA sequences and structural elements makes this identification challenging and difficult.Results: In the present study, we propose a new modular and extensible machine learning method based on multiple kernels and a support vector machine (SVM) classifier for piRNA identification. Very few piRNA features are known to date. The use of a multiple kernels approach allows editing, adding or removing piRNA features that can be heterogeneous in a modular manner according to their relevance in a given species. Our algorithm is based on a combination of the previously identified features [sequence features (k-mer motifs and a uridine at the first position) and piRNAs cluster feature] and a new telomere/centromere vicinity feature. These features are heterogeneous, and the kernels allow to unify their representation. The proposed algorithm, named piRPred, gives promising results on Drosophila and Human data and outscores previously published piRNA identification algorithms.Availability and implementation: piRPred is freely available to non-commercial users on our Web server EvryRNA http://EvryRNA.ibisc.univ-evry.frContact: tahi@ibisc.univ-evry.fr
Identification of microRNAs (miRNAs) is an important step toward understanding post-transcriptional gene regulation and miRNA-related pathology. Difficulties in identifying miRNAs through experimental techniques combined with the huge amount of data from new sequencing technologies have made in silico discrimination of bona fide miRNA precursors from non-miRNA hairpin-like structures an important topic in bioinformatics. Among various techniques developed for this classification problem, machine learning approaches have proved to be the most promising. However these approaches require the use of training data, which is problematic due to an imbalance in the number of miRNAs (positive data) and non-miRNAs (negative data), which leads to a degradation of their performance. In order to address this issue, we present an ensemble method that uses a boosting technique with support vector machine components to deal with imbalanced training data. Classification is performed following a feature selection on 187 novel and existing features. The algorithm, miRBoost, performed better in comparison with state-of-the-art methods on imbalanced human and cross-species data. It also showed the highest ability among the tested methods for discovering novel miRNA precursors. In addition, miRBoost was over 1400 times faster than the second most accurate tool tested and was significantly faster than most of the other tools. miRBoost thus provides a good compromise between prediction efficiency and execution time, making it highly suitable for use in genome-wide miRNA precursor prediction. The software miRBoost is available on our web server http://EvryRNA.ibisc.univ-evry.fr.
Background The use of predictive gene signatures to assist clinical decision is becoming more and more important. Deep learning has a huge potential in the prediction of phenotype from gene expression profiles. However, neural networks are viewed as black boxes, where accurate predictions are provided without any explanation. The requirements for these models to become interpretable are increasing, especially in the medical field. Results We focus on explaining the predictions of a deep neural network model built from gene expression data. The most important neurons and genes influencing the predictions are identified and linked to biological knowledge. Our experiments on cancer prediction show that: (1) deep learning approach outperforms classical machine learning methods on large training sets; (2) our approach produces interpretations more coherent with biology than the state-of-the-art based approaches; (3) we can provide a comprehensive explanation of the predictions for biologists and physicians. Conclusion We propose an original approach for biological interpretation of deep learning models for phenotype prediction from gene expression data. Since the model can find relationships between the phenotype and gene expression, we may assume that there is a link between the identified genes and the phenotype. The interpretation can, therefore, lead to new biological hypotheses to be investigated by biologists.
Abstract-We present progress on a comprehensive, modular, interactive modeling environment centered on overall regulation of blood pressure and body fluid homeostasis. We call the project SAPHIR, for "a Systems Approach for PHysiological Integration of Renal, cardiac, and respiratory functions". The project uses state-of-the-art multi-scale simulation methods. The basic core model will give succinct input-output (reduced-dimension) descriptions of all relevant organ systems and regulatory processes, and it will be modular, multi-resolution, and extensible, in the sense that detailed submodules of any process(es) can be "plugged-in" to the basic model in order to explore, eg. system-level implications of local perturbations. The goal is to keep the basic core model compact enough to insure fast execution time (in view of eventual use in the clinic) and yet to allow elaborate detailed modules of target tissues or organs in order to focus on the problem area while maintaining the system-level regulatory compensations.
Background With the rapid advancement of genomic sequencing techniques, massive production of gene expression data is becoming possible, which prompts the development of precision medicine. Deep learning is a promising approach for phenotype prediction (clinical diagnosis, prognosis, and drug response) based on gene expression profile. Existing deep learning models are usually considered as black-boxes that provide accurate predictions but are not interpretable. However, accuracy and interpretation are both essential for precision medicine. In addition, most models do not integrate the knowledge of the domain. Hence, making deep learning models interpretable for medical applications using prior biological knowledge is the main focus of this paper. Results In this paper, we propose a new self-explainable deep learning model, called Deep GONet, integrating the Gene Ontology into the hierarchical architecture of the neural network. This model is based on a fully-connected architecture constrained by the Gene Ontology annotations, such that each neuron represents a biological function. The experiments on cancer diagnosis datasets demonstrate that Deep GONet is both easily interpretable and highly performant to discriminate cancer and non-cancer samples. Conclusions Our model provides an explanation to its predictions by identifying the most important neurons and associating them with biological functions, making the model understandable for biologists and physicians.
Many computational tools have been proposed during the two last decades for predicting piRNAs, which are molecules with important role in post-transcriptional gene regulation. However, these tools are mostly based on only one feature that is generally related to the sequence. Discoveries in the domain of piRNAs are still in their beginning stages, and recent publications have shown many new properties. Here, we propose an integrative approach for piRNA prediction in which several types of genomic and epigenomic properties that can be used to characterize these molecules are examined. We reviewed and extracted a large number of piRNA features from the literature that have been observed experimentally in several species. These features are represented by different kernels, in a Multiple Kernel Learning based approach, implemented within an object-oriented framework. The obtained tool, called IpiRId, shows prediction results that attain more than 90% of accuracy on different tested species (human, mouse and fly), outperforming all existing tools. Besides, our method makes it possible to study the validity of each given feature in a given species. Finally, the developed tool is modular and easily extensible, and can be adapted for predicting other types of ncRNAs. The IpiRId software and the user-friendly web-based server of our tool are now freely available to academic users at: https://evryrna.ibisc.univ-evry.fr/evryrna/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.