BackgroundNon-coding RNAs (ncRNAs) play crucial roles in many biological processes, such as post-transcription of gene regulation. ncRNAs mainly function through interaction with RNA binding proteins (RBPs). To understand the function of a ncRNA, a fundamental step is to identify which protein is involved into its interaction. Therefore it is promising to computationally predict RBPs, where the major challenge is that the interaction pattern or motif is difficult to be found.ResultsIn this study, we propose a computational method IPMiner (Interaction Pattern Miner) to predict ncRNA-protein interactions from sequences, which makes use of deep learning and further improves its performance using stacked ensembling. One of the IPMiner’s typical merits is that it is able to mine the hidden sequential interaction patterns from sequence composition features of protein and RNA sequences using stacked autoencoder, and then the learned hidden features are fed into random forest models. Finally, stacked ensembling is used to integrate different predictors to further improve the prediction performance. The experimental results indicate that IPMiner achieves superior performance on the tested lncRNA-protein interaction dataset with an accuracy of 0.891, sensitivity of 0.939, specificity of 0.831, precision of 0.945 and Matthews correlation coefficient of 0.784, respectively. We further comprehensively investigate IPMiner on other RNA-protein interaction datasets, which yields better performance than the state-of-the-art methods, and the performance has an increase of over 20 % on some tested benchmarked datasets. In addition, we further apply IPMiner for large-scale prediction of ncRNA-protein network, that achieves promising prediction performance.ConclusionBy integrating deep neural network and stacked ensembling, from simple sequence composition features, IPMiner can automatically learn high-level abstraction features, which had strong discriminant ability for RNA-protein detection. IPMiner achieved high performance on our constructed lncRNA-protein benchmark dataset and other RNA-protein datasets. IPMiner tool is available at http://www.csbio.sjtu.edu.cn/bioinf/IPMiner.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2931-8) contains supplementary material, which is available to authorized users.
BackgroundAdenosine-5′-triphosphate (ATP) is one of multifunctional nucleotides and plays an important role in cell biology as a coenzyme interacting with proteins. Revealing the binding sites between protein and ATP is significantly important to understand the functionality of the proteins and the mechanisms of protein-ATP complex.ResultsIn this paper, we propose a novel framework for predicting the proteins’ functional residues, through which they can bind with ATP molecules. The new prediction protocol is achieved by combination of sequence evolutional information and bi-profile sampling of multi-view sequential features and the sequence derived structural features. The hypothesis for this strategy is single-view feature can only represent partial target’s knowledge and multiple sources of descriptors can be complementary.ConclusionsPrediction performances evaluated by both 5-fold and leave-one-out jackknife cross-validation tests on two benchmark datasets consisting of 168 and 227 non-homologous ATP binding proteins respectively demonstrate the efficacy of the proposed protocol. Our experimental results also reveal that the residue structural characteristics of real protein-ATP binding sites are significant different from those normal ones, for example the binding residues do not show high solvent accessibility propensities, and the bindings prefer to occur at the conjoint points between different secondary structure segments. Furthermore, results also show that performance is affected by the imbalanced training datasets by testing multiple ratios between positive and negative samples in the experiments. Increasing the dataset scale is also demonstrated useful for improving the prediction performances.
Fine-grained classification and grading of breast cancer (BC) histopathological images are of great value in clinical application. However, automatic classification and grading of BC histopathological images are complicated by (1) small inter-class variance and large intraclass variance exist in BC histopathological images, and (2) features extracted from similar histopathological images with different magnification are quite different. To address these issues, an improved deep convolution neural network model is proposed and the procedure can be divided into three main stages. Firstly, in the representation learning process, multiclass recognition task and verification task of image pair are combined. Secondly, in the feature extraction process, a prior knowledge is built, which is "the variances in feature outputs between different subclasses is relatively large while the variance between the same subclass is small." Additionally, the prior information that histopathological images with different magnification belong to the same subclass are embedded in the feature extraction process, which contributes to less sensitive with image magnification. The experimental results based on three different histopathological image datasets show that the performance of the proposed method is better than state of the art, with better robustness and generalization ability. Keywords Multi-task deep learning • Histopathological image classification • Fine-grained • Convolutional neural network • Breast cancer Lingqiao Li and Xipeng Pan are contributed equally
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.