2020
DOI: 10.1109/access.2020.2971091
|View full text |Cite
|
Sign up to set email alerts
|

SMOPredT4SE: An Effective Prediction of Bacterial Type IV Secreted Effectors Using SVM Training With SMO

Abstract: Various bacterial pathogens can deliver their secreted effectors to host cells via type IV secretion system (T4SS) and cause host diseases. Since T4SS secreted effectors (T4SEs) play important roles in the interaction between pathogens and host, identifying T4SEs is crucial to understanding of the pathogenic mechanism of T4SS. We established an effective predictor called SMOPredT4SE to identify T4SEs from protein sequences. SMOPredT4SE employed combination features of series correlation pseudo amino acid compo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 105 publications
(54 reference statements)
0
5
0
Order By: Relevance
“…FY, et al 2019 Prediction, Association Rules BN, SVM, NB, DT Q2 Population BN 92.35%, RMSE 0.26 10-fold cross-validation temperature (min, ma, average), minimum humidity and rainfall 10.1109/BigDataCongress.2017.54 [52] The motivation behind this study is to provide a basic framework for biologists, which is based on big data analytics and deep learning models. Huaming Chen et al 2017 DL DL Q2, Q3 Proteomics protein–protein interaction 10.1109/ACCESS.2020.2971091 [48] SMOPredT4SE employed combination features of series correlation pseudo amino acid composition and position-specific scoring matrix to present protein sequences, and employed support vector machines (SVM) to identifying T4SEs Zihao Yan et al 2020 Prediction Classification SVM, RF, NB, kNN, Bagging, SGD, LibD3C. Q2, Q3 Proteomics 95.60% 5-fold cross-validation composed of 305 T4SEs and 610 non-T4SEs ** Notations : ML-Machine Learning, DM-Data Mining, support vector machines (SVM), and artificial neural networks (ANN), DT:-Decision Tree, RF:-Random Forest, GBR:-Generalized Boosted Regression, NB:-Naïve Bayes, SVM:-Support Vector Machine, KNN:-k-Nearest Neighbors, KM:-k-Means, NetA:-Network Analysis, RT:-Regression Tree, DNN:-Deep Neuron Networks, PN:-Phylogenetic Neighborhood, SVM-RFB-k:-SVM-RBF kernel, ANN:-Artificial Neural Network, DL:-Deep Learning, BRT:-Boosted Regression Tree, BN:-Bayes Network, GB:- Gradient Boosting, GrB:- Generalized Boosted, AdaBoost:-Adaptive Boosting, LR:- Logistic Regression, HD-LDA:- Hierarchical Divisive and Latent Dirichlet Allocation, GBMs:- Gradient Boosting Machines, RBF-t:- RBF tree, GB-t:- gradient boosted tree, SVM-RLK:- support vector machine (radial and linear kernel), CTA:- Classification Tree Analysis, RRF:- Regularized Random Forest, E-SVM:- Ensemble of three SVM, HA:- Hierarchical Agglomerative, C:- Clustering, GLMM:- Generalized Linear Mixed Models, SVM-Lk:- SVM-L kernel, Ens:- Ensemble, 2-L-SVM-E:- two-layer SVM-based ensemble model, CNN:- deep Convolutional Neural Network, ERT:- Extremely Randomized Trees (ERT), DL:- Deep Learning, MLP:-Multilayer Perceptron, XGB:- eXtreme Gradient Boosting, MC-SGE:- Meta-Classifiers (Stacked Generalized Ensemble).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…FY, et al 2019 Prediction, Association Rules BN, SVM, NB, DT Q2 Population BN 92.35%, RMSE 0.26 10-fold cross-validation temperature (min, ma, average), minimum humidity and rainfall 10.1109/BigDataCongress.2017.54 [52] The motivation behind this study is to provide a basic framework for biologists, which is based on big data analytics and deep learning models. Huaming Chen et al 2017 DL DL Q2, Q3 Proteomics protein–protein interaction 10.1109/ACCESS.2020.2971091 [48] SMOPredT4SE employed combination features of series correlation pseudo amino acid composition and position-specific scoring matrix to present protein sequences, and employed support vector machines (SVM) to identifying T4SEs Zihao Yan et al 2020 Prediction Classification SVM, RF, NB, kNN, Bagging, SGD, LibD3C. Q2, Q3 Proteomics 95.60% 5-fold cross-validation composed of 305 T4SEs and 610 non-T4SEs ** Notations : ML-Machine Learning, DM-Data Mining, support vector machines (SVM), and artificial neural networks (ANN), DT:-Decision Tree, RF:-Random Forest, GBR:-Generalized Boosted Regression, NB:-Naïve Bayes, SVM:-Support Vector Machine, KNN:-k-Nearest Neighbors, KM:-k-Means, NetA:-Network Analysis, RT:-Regression Tree, DNN:-Deep Neuron Networks, PN:-Phylogenetic Neighborhood, SVM-RFB-k:-SVM-RBF kernel, ANN:-Artificial Neural Network, DL:-Deep Learning, BRT:-Boosted Regression Tree, BN:-Bayes Network, GB:- Gradient Boosting, GrB:- Generalized Boosted, AdaBoost:-Adaptive Boosting, LR:- Logistic Regression, HD-LDA:- Hierarchical Divisive and Latent Dirichlet Allocation, GBMs:- Gradient Boosting Machines, RBF-t:- RBF tree, GB-t:- gradient boosted tree, SVM-RLK:- support vector machine (radial and linear kernel), CTA:- Classification Tree Analysis, RRF:- Regularized Random Forest, E-SVM:- Ensemble of three SVM, HA:- Hierarchical Agglomerative, C:- Clustering, GLMM:- Generalized Linear Mixed Models, SVM-Lk:- SVM-L kernel, Ens:- Ensemble, 2-L-SVM-E:- two-layer SVM-based ensemble model, CNN:- deep Convolutional Neural Network, ERT:- Extremely Randomized Trees (ERT), DL:- Deep Learning, MLP:-Multilayer Perceptron, XGB:- eXtreme Gradient Boosting, MC-SGE:- Meta-Classifiers (Stacked Generalized Ensemble).…”
Section: Resultsmentioning
confidence: 99%
“…The model accuracy range was 82–97%. The problems addressed included: the identification of high risk snail habitats as a function of Schistosoma japonicum infection [43] , modelling of tick bite risk based on ecological factors [44] , predicting the global distribution of Aedes mosquitoes and the effects of seasonal changes on their range [45] , [46] and the prediction of Dengue virus outbreak risk based on climate [47] , [48] .…”
Section: Resultsmentioning
confidence: 99%
“…Where is the frequency of occurrence of the 20 amino acids, is the k-layer sequence correlation factor, and is the weighting factor for sequence order effects, = 0.05 in our study. The λ components can be defined by the user at will ( Yan et al, 2020 ). In this experiment, hydrophilic, hydrophobic, mass, pK1, pK2, pI, rigidity, flexibility, and irreplaceability are added, resulting in a 65-dimensional feature vector.…”
Section: Methodsmentioning
confidence: 99%
“…Instead, a large number of computational methods have been developed for prediction of T4SEs in the last decade, which successfully speed up the process in terms of time and efficiency. These computational approaches can be categorized into two main groups: the first group of approaches infer new effectors based on sequence similarity with currently known effectors (Chen et al, 2010 ; Lockwood et al, 2011 ; Marchesini et al, 2011 ; Meyer et al, 2013 ; Sankarasubramanian et al, 2016 ; Noroy et al, 2019 ) or phylogenetic profiling analysis (Zalguizuri et al, 2019 ), and the second group of approaches involve learning the patterns of known secreted effectors that distinguish them from non-secreted proteins based on machine learning and deep learning techniques (Burstein et al, 2009 ; Lifshitz et al, 2013 ; Zou et al, 2013 ; Wang et al, 2014 ; Ashari et al, 2017 ; Wang Y. et al, 2017 ; Esna Ashari et al, 2018 , 2019a , b ; Guo et al, 2018 ; Xiong et al, 2018 ; Xue et al, 2018 ; Acici et al, 2019 ; Chao et al, 2019 ; Hong et al, 2019 ; Wang J. et al, 2019 ; Li J. et al, 2020 ; Yan et al, 2020 ). In the latter group of methods, Burstein et al ( 2009 ) worked on Legionella pneumophila to identify T4SEs and validated 40 novel effectors which were predicted by machine learning algorithms.…”
Section: Introductionmentioning
confidence: 99%