2019
DOI: 10.1371/journal.pone.0202312
|View full text |Cite
|
Sign up to set email alerts
|

Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila

Abstract: Type IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to predict effector proteins, potentially facilitating experimental verification. However, inconsistencies exist between their results. Previously we analysed the disparate sets of predictive features used in these algo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
20
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(20 citation statements)
references
References 40 publications
0
20
0
Order By: Relevance
“… Wheeler N. E, et al 2018 Classification TD, RF Q1, Q3, Q4 Genomics, Transcriptomics 100% out-of-bag classification accuracy out-of-bag classification accuracy atypical mutations in protein coding genes 29,760,095 [22] Predictive model based on machine learning algorithms to reliably determine malaria infection status in humans based on volatile biomarkers De Moraes CM, et al 2018 Prediction, Classification RF, RRF, AdaBoost Q1, Q2, Q3, Q5, Q6 Proteomics, Metabolomics 0.95, 80, 92 10 Fold cross-validation 17 (4-hydroxy-4-methylpentan-2-one), multiple compounds (compound 49 , 31, 61, 5, 9, 14, 20, 38) 30,416,498 [34] Development of an in silico method to predict whether a protein is an effector of type IV secretion system or not based on its sequence information. Xiong Y, et al 2018 Prediction, Classification NB, KNN, LR, ERT, GBM, XGB, SVM, RF, MC-SGE Q1, Q3 Transcriptomics 73.2, 85.5, 87.9, 89.4, 90.5, 90.1, 90.2, 88.5, metric of F1 5-fold cross-validation, independent test for testing the generalization ability PSSM-composition features 30,682,021 [49] This study focuses on the best way to use validated effector protein features for effector prediction using three machine learning classifiers, and compares results with those of others to obtain de novo results Esna Ashari Z, et al 2019 Classification, Prediction, Clustering SVM, E-SVM Q2, Q3, Q4, Q5 Transcriptomics, Proteomics 94.05%, 93.64%, and 92.44%, for Models 1, 2, and 3, respectively. 10 fold cross-validation Optimal feature set includes 15 features (i.e, coiled coil domains, hydropath, PSSM composites) 31,146,762 [23] Enabling rapid assessment of mosquito blood-feeding histories and vectorial capacities using Mid-infrared spectroscopy and supervised machine learning .…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“… Wheeler N. E, et al 2018 Classification TD, RF Q1, Q3, Q4 Genomics, Transcriptomics 100% out-of-bag classification accuracy out-of-bag classification accuracy atypical mutations in protein coding genes 29,760,095 [22] Predictive model based on machine learning algorithms to reliably determine malaria infection status in humans based on volatile biomarkers De Moraes CM, et al 2018 Prediction, Classification RF, RRF, AdaBoost Q1, Q2, Q3, Q5, Q6 Proteomics, Metabolomics 0.95, 80, 92 10 Fold cross-validation 17 (4-hydroxy-4-methylpentan-2-one), multiple compounds (compound 49 , 31, 61, 5, 9, 14, 20, 38) 30,416,498 [34] Development of an in silico method to predict whether a protein is an effector of type IV secretion system or not based on its sequence information. Xiong Y, et al 2018 Prediction, Classification NB, KNN, LR, ERT, GBM, XGB, SVM, RF, MC-SGE Q1, Q3 Transcriptomics 73.2, 85.5, 87.9, 89.4, 90.5, 90.1, 90.2, 88.5, metric of F1 5-fold cross-validation, independent test for testing the generalization ability PSSM-composition features 30,682,021 [49] This study focuses on the best way to use validated effector protein features for effector prediction using three machine learning classifiers, and compares results with those of others to obtain de novo results Esna Ashari Z, et al 2019 Classification, Prediction, Clustering SVM, E-SVM Q2, Q3, Q4, Q5 Transcriptomics, Proteomics 94.05%, 93.64%, and 92.44%, for Models 1, 2, and 3, respectively. 10 fold cross-validation Optimal feature set includes 15 features (i.e, coiled coil domains, hydropath, PSSM composites) 31,146,762 [23] Enabling rapid assessment of mosquito blood-feeding histories and vectorial capacities using Mid-infrared spectroscopy and supervised machine learning .…”
Section: Resultsmentioning
confidence: 99%
“…environmental data in map format, OMICs data in sequences format, etc.). Thought this issue was better addressed compared to the missing data issues [32] , [34] , [49] , it was still inadequately addressed by some of the studies identified in our review. In machine learning, imbalanced data input can hamper model performance and contribute to inaccuracy.…”
Section: Discussionmentioning
confidence: 96%
“…Instead, a large number of computational methods have been developed for prediction of T4SEs in the last decade, which successfully speed up the process in terms of time and efficiency. These computational approaches can be categorized into two main groups: the first group of approaches infer new effectors based on sequence similarity with currently known effectors (Chen et al, 2010 ; Lockwood et al, 2011 ; Marchesini et al, 2011 ; Meyer et al, 2013 ; Sankarasubramanian et al, 2016 ; Noroy et al, 2019 ) or phylogenetic profiling analysis (Zalguizuri et al, 2019 ), and the second group of approaches involve learning the patterns of known secreted effectors that distinguish them from non-secreted proteins based on machine learning and deep learning techniques (Burstein et al, 2009 ; Lifshitz et al, 2013 ; Zou et al, 2013 ; Wang et al, 2014 ; Ashari et al, 2017 ; Wang Y. et al, 2017 ; Esna Ashari et al, 2018 , 2019a , b ; Guo et al, 2018 ; Xiong et al, 2018 ; Xue et al, 2018 ; Acici et al, 2019 ; Chao et al, 2019 ; Hong et al, 2019 ; Wang J. et al, 2019 ; Li J. et al, 2020 ; Yan et al, 2020 ). In the latter group of methods, Burstein et al ( 2009 ) worked on Legionella pneumophila to identify T4SEs and validated 40 novel effectors which were predicted by machine learning algorithms.…”
Section: Introductionmentioning
confidence: 99%
“…As a result of the disparities between the results of earlier methods, we assembled all the features used in prior studies and used a multi-level, statistical approach to determine which were the most effective in predicting effector proteins (Esna Ashari et al, 2017, 2018). Because of the number of validated effectors available for L. pneumophila , we then ran a number of experiments on the whole genome of L. pneumophila using our optimal set of features (Esna Ashari et al, 2019). A comparison of our results with the list of validated effectors and those of previous studies was highly encouraging.…”
Section: Introductionmentioning
confidence: 99%
“…In addition to applying our model for T4SS effector prediction to A. phagocytophilum , we also improved it based on what we learned from our previous study (Esna Ashari et al, 2019) and expanded the code to make it easy for microbiologists to use for other bacteria with T4 secretion systems. We created a software package called OPT4e, for Optimal-features Predictor for T4SS Effector proteins, that performs all the steps described in our previous studies as well as incorporating new steps, including automation of feature evaluation which is very time consuming for whole proteomes.…”
Section: Introductionmentioning
confidence: 99%