Protein-protein interactions (PPIs) in plants plays a significant role in plant biology and functional organization of cells. Although, a large amount of plant PPIs data have been generated by high-throughput techniques, but due to the complexity of plant cell, the PPIs pairs currently obtained by experimental methods cover only a small fraction of the complete plant PPIs network. In addition, the experimental approaches for identifying PPIs in plants are laborious, time-consuming, and costly. Hence, it is highly desirable to develop more efficient approaches to detect PPIs in plants. In this study, we present a novel computational model combining weighted sparse representation-based classifier (WSRC) with a novel inverse fast Fourier transform (IFFT) representation scheme which was adopted in position specific scoring matrix (PSSM) to extract features from plant protein sequence. When performed the proposed method on the plants PPIs dataset of Mazie, Rice and Arabidopsis thaliana (Arabidopsis), we achieved excellent results with high accuracies of 89.12%, 84.72% and 71.74%, respectively. To further assess the prediction performance of the proposed approach, we compared it with the state-of-art support vector machine (SVM) classifier. To the best of our knowledge, we are the first to employ protein sequences information to predict PPIs in plants. Experimental results demonstrate that the proposed method has a great potential to become a powerful tool for exploring the plant cell function.Receiver Operating Characteristic curve (AUC) is calculated used for demonstrating the quality of prediction model.
Assessment of Prediction Ability.In this article, we used 5-fold cross-validation to evaluate the predictive ability of our model in three plant data sets involving Maize, Rice and Arabidopsis. In this way, we can prevent overfitting and test the stability of the proposed method. More specifically, the whole data set is partitioned into five roughly equal parts, four of them were used to construct a training set and the rest one was adopted as a testing set. Thus, five models can be generated for the five sets of data. The cross validation has the advantages that it can minimize the impact of data dependency and improved the reliability of the results.The five-fold cross validation results of the proposed approach on the three plants datasets are listed in Table 1-3. Form Table 1, we can observe that when applying the proposed method to the Mazie data set, we obtained best prediction results of average accuracy, precision, sensitivity, and MCC were 89.12%, 87.49%, 91.32%, and 80.59%, with corresponding standard deviations 0.59%, 1.38%, 0.64%, and 0.94%, respectively. When exploring the proposed method on the Rice dataset, we yield the good results of average accuracy, precision, sensitivity, MCC of 84.72%, 85.04%, 84.44% and 84.10%, respectively. The standard deviations of these criteria values are 0.73%, 0.85%, 0.65% and 1.00% respectively. When predicting PPIs of Arabidopsis dataset, the proposed approach obtain...