2019
DOI: 10.1002/prot.25801
|View full text |Cite
|
Sign up to set email alerts
|

Boosting phosphorylation site prediction with sequence feature‐based machine learning

Abstract: Protein phosphorylation is one of the essential posttranslation modifications playing a vital role in the regulation of many fundamental cellular processes. We propose a LightGBM‐based computational approach that uses evolutionary, geometric, sequence environment, and amino acid‐specific features to decipher phosphate binding sites from a protein sequence. Our method, while compared with other existing methods on 2429 protein sequences taken from standard Phospho.ELM (P.ELM) benchmark data set featuring 11 org… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(14 citation statements)
references
References 34 publications
0
14
0
Order By: Relevance
“…With regard to post-mRMR and SU attribute selection, of the PP-based features, AAC, molecular weight, residue volume, flexibility, and partition coefficient of predominantly AA11 turned out to be highly significant for classification, followed by the hydrophobicity of the AAs around the central residue. These features have already been shown to be relevant for discriminating between phosphorylated and non-phosphorylated sites [ 13 , 18 ]. Table 2 lists the initial number of different features types used to encode sequence fragments.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…With regard to post-mRMR and SU attribute selection, of the PP-based features, AAC, molecular weight, residue volume, flexibility, and partition coefficient of predominantly AA11 turned out to be highly significant for classification, followed by the hydrophobicity of the AAs around the central residue. These features have already been shown to be relevant for discriminating between phosphorylated and non-phosphorylated sites [ 13 , 18 ]. Table 2 lists the initial number of different features types used to encode sequence fragments.…”
Section: Resultsmentioning
confidence: 99%
“…Although these techniques provide a vast amount of data when operated in a high-throughput manner, they are laborious, costly, time-consuming, and often produce false positives and false negatives. A large number of PTMs thus remain unidentified or misclassified, and the associated mechanisms in context of cellular and biological processes are overlooked [ 13 ]. The computational prediction of protein phosphorylation sites appears to be a promising alternative strategy for reducing the associated costs and time.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…26 In addition, LightGBM proposes gradient-based one-side sampling, exclusive feature bundling, and leaf-wise growth strategy to obtain better accuracy and efficient computation. Meanwhile, it also adopted limiting maximum depth parameters to mitigate over-fitting 55 , 56 and LightGBM has been widely used in bioinformatics. 57 , 58 ExtraTree is also a tree-based algorithm that was proposed by Pierre Geurts et al.…”
Section: Methodsmentioning
confidence: 99%
“…In our case, both precision and recall are essential since we were interested in finding the association of the target variable (high risk of diabetes) with the explanatory variable(s). So, we selected a classification algorithm that maximizes an F1 score metric, a harmonic mean of both recall and precision [47]. A set of classification algorithms was screened based on the F1 score, and the algorithm with the highest F1 score was selected.…”
Section: E Classifier Performance Measures Evaluationmentioning
confidence: 99%