The complex language of eukaryotic gene expression remains incompletely understood. Despite the importance suggested by many proteins variants statistically associated with human disease, nearly all such variants have unknown mechanisms, for example, protein-protein interactions (PPIs). In this study, we address this challenge using a recent machine learning advance-deep neural networks (DNNs). We aim at improving the performance of PPIs prediction and propose a method called DeepPPI (Deep neural networks for Protein-Protein Interactions prediction), which employs deep neural networks to learn effectively the representations of proteins from common protein descriptors. The experimental results indicate that DeepPPI achieves superior performance on the test data set with an Accuracy of 92.50%, Precision of 94.38%, Recall of 90.56%, Specificity of 94.49%, Matthews Correlation Coefficient of 85.08% and Area Under the Curve of 97.43%, respectively. Extensive experiments show that DeepPPI can learn useful features of proteins pairs by a layer-wise abstraction, and thus achieves better prediction performance than existing methods. The source code of our approach can be available via http://ailab.ahu.edu.cn:8087/DeepPPI/index.html .
Oversampling is an efficient technique in dealing with class-imbalance problem. It addresses the problem by reduplicating or generating the minority class samples to balance the distribution between the samples of the majority and the minority class. Synthetic minority oversampling technique (SMOTE) is one of the typical representatives. During the past decade, researchers have proposed many variants of SMOTE. However, the existing oversampling methods may generate wrong minority class samples in some scenarios. Furthermore, how to effectively mine the inherent complex characteristics of imbalanced data remains a challenge. To this end, this paper proposes a parameter-free data cleaning method to improve SMOTE based on constructive covering algorithm. The dataset generated by SMOTE is first partitioned into a group of covers, then the hard-to-learn samples can be detected based on the characteristics of sample space distribution. Finally, a pair-wise deletion strategy is proposed to remove the hard-to-learn samples. The experimental results on 25 imbalanced datasets show that our proposed method is superior to the comparison methods in terms of various metrics, such as F-measure, G-mean, and Recall. Our method not only can reduce the complexity of the dataset but also can improve the performance of the classification model.
INDEX TERMSImbalanced data, SMOTE, oversampling, constructive covering algorithm, data cleaning.
With the rapid growth of web services on the Internet, it becomes more difficult for users who want to choose the high-quality web services from a large number of functionally equivalent candidate services. Therefore, the prediction of quality of service (QoS) values according to the history of web services has received extensive attention. In recent years, deep learning has achieved great success in speech recognition, image processing, and natural language understanding. However, it is rarely applied to the service recommendation field. Therefore, a novel approach for QoS prediction named NDL (neighborhood-aware deep learning) is proposed. NDL first gets the Top-k neighbors of the user and the service through the Pearson correlation coefficient according to the service QoS information. Then, it extracts the potential features of the user neighbor and the service neighbor; after that, it inputs the QoS values of the user and the user neighbor as well as the QoS values of the service and service neighbors as a convolutional neural network. The results of experiments conducted on a real-world dataset demonstrate that the NDL significantly outperforms the current QoS prediction method in prediction accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.