NeuroPpred-Fuse: an interpretable stacking model for prediction of neuropeptides by fusing sequence information and feature selection methods

Jiang, Mingming; Zhao, Bowen; Luo, Shenggan; Wang, Qiankun; Chu, Yanyi; Chen, Tianhang; Mao, Xueying; Liu, Yatong; Wang, Yanjing; Jiang, Xue; Wei, Dong‐Qing; Xiong, Yi

doi:10.1093/bib/bbab310

Cited by 29 publications

(19 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…AAC has been used as features for building many different models, ,,, which consists of the relative abundances of the 20 types of natural amino acids in a specified peptide segment. Given a kind of residue, R i represents the occurrent frequency of the residue; its relative abundance can be calculated as follows f i = R i N , goodbreak0em2em⁣ ( i = 1 , 2 , 3 , ... , 20 ) where N refers to the length of a specified peptide; thus, we can get the AAC feature vector of the peptide segment as F normalA normalA normalC = ( f 1 , f 2 , f 3 , ... , f 20 ) …”

Section: Materials and Methodsmentioning

confidence: 99%

NeuroPpred-SVM: A New Model for Predicting Neuropeptides Based on Embeddings of BERT

Liu

Wang

et al. 2023

J. Proteome Res.

View full text Add to dashboard Cite

Neuropeptides play pivotal roles in different physiological processes and are related to different kinds of diseases. Identification of neuropeptides is of great benefit for studying the mechanism of these physiological processes and the treatment of neurological disorders. Several state-of-the-art neuropeptide predictors have been developed by using a two-layer stacking ensemble algorithm. Although the two-layer stacking ensemble algorithm can improve the feature representability, these models are complex, which are not as efficient as the models based on one classifier. In this study, we proposed a new model, NeuroPpred-SVM, to predict neuropeptides based on the embeddings of Bidirectional Encoder Representations from Transformers and other sequential features by using a support vector machine (SVM). The experimental results indicate that our model achieved a cross-validation area under the receiver operating characteristic (AUROC) curve of 0.969 on the training data set and an AUROC of 0.966 on the independent test set. By comparing our model with the other four state-of-the-art models including NeuroPIpred, PredNeuroP, NeuroPpred-Fuse, and NeuroPpred-FRL on the independent test set, our model achieved the highest AUROC, Matthews correlation coefficient, accuracy, and specificity, which indicate that our model outperforms the existing models. We believed that NeuroPpred-SVM could be a useful tool for identifying neuropeptides with high accuracy and low cost. The data sets and Python code are available at .

show abstract

Section: Materials and Methodsmentioning

confidence: 99%

NeuroPpred-SVM: A New Model for Predicting Neuropeptides Based on Embeddings of BERT

Liu

Wang

et al. 2023

J. Proteome Res.

View full text Add to dashboard Cite

show abstract

“…Many previous studies have shown that the ensemble model can achieve better predictive performance than single models in the ensemble, and reduce the generalization error of the prediction ( Charoenkwan et al., 2021 ; Mishra et al., 2019 ; Basith et al., 2022 ; Liang et al., 2021 ; Jiang et al., 2021 ; Guo et al., 2021 ). The existing ensemble learning strategies include boosting, bagging, and stacking ( Verma and Mehta, 2017 ).…”

Section: Methodsmentioning

confidence: 99%

“…Second, three-quarters of existing methods only applied a single algorithm. However, lots of studies have proven that the ensemble learning model usually outperforms the single-algorithm-based model ( Guo et al., 2021 ; Jiang et al., 2021 ; Basith et al., 2022 ; Liang et al., 2021 ; Mishra et al., 2019 ). Thus, the utilization of an ensemble learning strategy might improve the performance of AIP identification.…”

Section: Introductionmentioning

confidence: 99%

Prediction of anti-inflammatory peptides by a sequence-based stacking ensemble model named AIPStack

Deng¹,

Lou²,

Wu³

et al. 2022

iScience

View full text Add to dashboard Cite

“…In addition, we compared the proposed method with previously published methods such as PredT4SE-Stack and NeuroPpred-Fuse [55,56]. Therefore, our approach was only compared them from the structural aspect.…”

Section: Principle Of Machine Learning Algorithm and Fusion Modelmentioning

confidence: 99%

Method for predicting the remaining mileage of electric vehicles based on dimension expansion and model fusion

Sheng

et al. 2022

IET Intelligent Trans Sys

View full text Add to dashboard Cite

Accurately predicting the remaining mileage of electric vehicles (EVs) can effectively alleviate user's mileage anxiety and develop refinement of energy management strategy. However, traditional prediction methods not only consume time and resources, but also accumulate errors and lack interpretability. In this paper, we proposed a model based on dimension expansion and model fusion strategy, which uses the extreme gradient boosting (XGBoost) algorithm to directly predict the remaining mileage of EVs. After pre‐processing the real running data of EVs, we constructed the field of remaining driving range and analyzed the relationship between features and remaining driving range, and then directly predicted the remaining driving mileage. Compared with other machine learning methods, XGBoost model has the highest accuracy. Then dimensional extended data set was obtained based on prior knowledge and symbol conversion, which improved the model performance. Finally, the model fusion strategy was adopted to further improve the generalization ability and stability of the model. The experimental results show that the Bootstrap aggregating (Bagging) fusion model has the highest predictive performance on the test set and outperformed other methods. The maximum RAE is not more than 3.5%, RMSE is less than 3km and MAE is about 2 km.

show abstract

NeuroPpred-Fuse: an interpretable stacking model for prediction of neuropeptides by fusing sequence information and feature selection methods

Cited by 29 publications

References 52 publications

NeuroPpred-SVM: A New Model for Predicting Neuropeptides Based on Embeddings of BERT

NeuroPpred-SVM: A New Model for Predicting Neuropeptides Based on Embeddings of BERT

Prediction of anti-inflammatory peptides by a sequence-based stacking ensemble model named AIPStack

Method for predicting the remaining mileage of electric vehicles based on dimension expansion and model fusion

Contact Info

Product

Resources

About