2020
DOI: 10.1101/2020.06.18.158253
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

T4SE-XGB: interpretable sequence-based prediction of type IV secreted effectors using eXtreme gradient boosting algorithm

Abstract: Type IV secreted effectors (T4SEs) can be translocated into the cytosol of host cells via type IV secretion system (T4SS) and cause diseases. However, experimental approaches to identify T4SEs are time-and resource-consuming, and the existing computational tools based on machine learning techniques have some obvious limitations such as the lack of interpretability in the prediction models. In this study, we proposed a new model, T4SE-XGB, which uses the eXtreme gradient boosting (XGBoost) algorithm for accurat… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 93 publications
0
6
0
Order By: Relevance
“…In traditional gradient boosting, each new tree specifically focuses on the error of the previous tree. XGBoost adds more regularization terms in the model to control model over-fitting, which makes the model have a better performance ( Chen and Guestrin, 2016 ; Chen et al, 2020 ). In this study, “XGBClassifier” from “xgboost” library 7 was used for prediction.…”
Section: Methodsmentioning
confidence: 99%
“…In traditional gradient boosting, each new tree specifically focuses on the error of the previous tree. XGBoost adds more regularization terms in the model to control model over-fitting, which makes the model have a better performance ( Chen and Guestrin, 2016 ; Chen et al, 2020 ). In this study, “XGBClassifier” from “xgboost” library 7 was used for prediction.…”
Section: Methodsmentioning
confidence: 99%
“…CNN-T4SE integrated three Convolutional Neural Network models training the amino acid composition, solvent accessibility and secondary structure of full-length T4SEs, achieving better performance than other tools and lower false positive predictions [298] . Other groups adopted an alternative strategy, by selecting the best optimized features, and/or training and identifying the best machine learning models, to improve the prediction performance [299] , [149] , [150] , [151] . Some of the models have been well applied in identification of T4SEs in L. pneumophila [151] and Anaplasma phagocytophilum (OPT4e; [150] ).…”
Section: Outer Membrane and Two-membrane Spanning Secretion Systemsmentioning
confidence: 99%
“…Importance measure of every aligned position to the predictive performance on test sets of each model was calculated by the SHAP package, 41 which was frequently adopted to understand sequence-property relationship in proteins. [42][43][44] For every one-hot feature of an FP sequence that undergoes prediction, SHAP assigns an importance measure to the feature called the SHAP value. A positive SHAP value corresponds to a positive contribution of the feature value to the predicted target, while a higher SHAP value corresponds to a higher importance of the feature value to the prediction of the target.…”
Section: Feature Importance Calculationsmentioning
confidence: 99%