2022
DOI: 10.1038/s41598-022-08173-5
|View full text |Cite
|
Sign up to set email alerts
|

SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins

Abstract: Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 26 publications
(12 citation statements)
references
References 55 publications
0
12
0
Order By: Relevance
“…In a sense, some traditional models, such as regression models, also have a good explainability, as we can assess the coefficients of each attribute to measure how important a feature is. These models however do not measure up in terms of effectiveness when compared to modern tree-based algorithms in many scenarios, especially in cases with larger datasets 33 . Another key difference between these models is that, in the case of regression models, we have to explicitly remove collinear variables, but these variables, even though they might not improve classification performance, still yield valid model explanations.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…In a sense, some traditional models, such as regression models, also have a good explainability, as we can assess the coefficients of each attribute to measure how important a feature is. These models however do not measure up in terms of effectiveness when compared to modern tree-based algorithms in many scenarios, especially in cases with larger datasets 33 . Another key difference between these models is that, in the case of regression models, we have to explicitly remove collinear variables, but these variables, even though they might not improve classification performance, still yield valid model explanations.…”
Section: Resultsmentioning
confidence: 99%
“…The combination of models based on different classification premises potentially made Stacking more robust. If a single classifier makes a wrong prediction, the others can still make corrections, increasing the robustness of the final stacking model [32][33][34][35] .…”
Section: Resultsmentioning
confidence: 99%
“…In order to enhance the performance of SVM classifiers, a grid search strategy was utilized to optimize the two important aspects of the RBF kernel, including C (controls the trade-off between the misclassification rate and margin) and γ (the kernel width parameter). Although SVM often yields satisfactory prediction performances, this method is known as a black-box computation method (Ahmad et al, 2022[ 1 ]; Charoenkwan et al, 2021[ 12 ]; Li et al, 2021[ 33 ]; Wei et al, 2021[ 54 ]).…”
Section: Methodsmentioning
confidence: 99%
“…A common solution is only to store k-mers that appear two or more times and to use a probabilistic data structure like a Bloom Filter or Counting Quotient Filter to remove singleton k-mers [53]. Other valuable numerical features include physicochemical properties, such as isoelectric point, aromaticity, molar extinction coefficient, instability index, molecular weight, polarity and hydrophobicity [54][55][56]. Each of these features can be calculated from the sequence alone using readily available bioinformatics tools [57].…”
Section: Homology-free Annotation Extracting Protein Sequence Featuresmentioning
confidence: 99%
“…Once a set of features has been curated, machine learning techniques are applied to differentiate between protein functions. These techniques include Support Vector Machines [58][59][60] and Artificial Neural Networks [54], as well as ensembles of multiple machine learning methods [55,56]. Several function predictors have been released as bioinformatic tools, including PhANNs (Phage Artificial Neural Networks) [54], which can classify phage proteins into one of ten structural classes with 86.2% accuracy.…”
Section: Homology-free Annotation Extracting Protein Sequence Featuresmentioning
confidence: 99%