Background:
Many malaria infections are caused by Plasmodium falciparum. Accurate classification of the proteins secreted by the malaria parasite, which are essential for the development of anti-malarial drugs, is essential.
Objective:
To accurately classify the proteins secreted by the malaria parasite.
Methods:
Therefore, in order to improve the accuracy of the prediction of plasmodium secreted proteins, we established a classification model MGAP-SGD. MonodikGap features (k=7) of the secreted proteins were extracted, and then the optimal features were selected by the AdaBoost method. Finally, based on the optimal set of secreted proteins, the model was used to predict the secreted proteins using the stochastic gradient descent (SGD) algorithm.
Results:
Our model uses a 10-fold cross-validation set and independent test set in the stochastic gradient descent (SGD) classifier to validate the model, and the accuracy rates are 98.5859% and 97.973%, respectively.
Conclusion:
This also fully proves that the effectiveness and robustness of the prediction results of the MGAP-SGD model can meet the prediction needs of the secreted proteins of plasmodium.
Immunoglobulin, which is also called an antibody, is a type of serum protein produced by B cells that can specifically bind to the corresponding antigen. Immunoglobulin is closely related to many diseases and plays a key role in medical and biological circles. Therefore, the use of effective methods to improve the accuracy of immunoglobulin classification is of great significance for disease research. In this paper, the CC–PSSM and monoTriKGap methods were selected to extract the immunoglobulin features, MRMD1.0 and MRMD2.0 were used to reduce the feature dimension, and the effect of discriminating the two–dimensional key features identified by the single dimension reduction method from the mixed two–dimensional key features was used to distinguish the immunoglobulins. The data results indicated that monoTrikGap (k = 1) can accurately predict 99.5614% of immunoglobulins under 5-fold cross–validation. In addition, CC–PSSM is the best method for identifying mixed two–dimensional key features and can distinguish 92.1053% of immunoglobulins. The above proves that the method used in this paper is reliable for predicting immunoglobulin and identifying key features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.