2012
DOI: 10.1186/1471-2105-13-s17-s3
|View full text |Cite
|
Sign up to set email alerts
|

Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition

Abstract: Background Existing methods for predicting protein solubility on overexpression in Escherichia coli advance performance by using ensemble classifiers such as two-stage support vector machine (SVM) based classifiers and a number of feature types such as physicochemical properties, amino acid and dipeptide composition, accompanied with feature selection. It is desirable to develop a simple and easily interpretable method for predicting protein solubility, compared to existing complex SVM-based method… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
72
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 60 publications
(72 citation statements)
references
References 30 publications
0
72
0
Order By: Relevance
“…Higher helix propensity has been reported to increase solubility (Idicula-Thomas and Balaji 2005; Huang et al 2012) . However, our analysis has shown that helical and turn propensities anti-correlate with solubility, whereas sheet propensity lacks correlation with solubility, suggesting that disordered regions may tend to be more soluble (Fig 3).…”
Section: Discussionmentioning
confidence: 99%
“…Higher helix propensity has been reported to increase solubility (Idicula-Thomas and Balaji 2005; Huang et al 2012) . However, our analysis has shown that helical and turn propensities anti-correlate with solubility, whereas sheet propensity lacks correlation with solubility, suggesting that disordered regions may tend to be more soluble (Fig 3).…”
Section: Discussionmentioning
confidence: 99%
“…To make a fair comparison with the existing PVP predictors [9,10,[12][13][14], the same benchmark and independent datasets that have been used in previous studies [12] were used to develop our proposed model. Due to the non-deterministic characteristic of the GA algorithm [26,32], ten SCM models in conjunction with ten different optimized dipeptide propensity scores (opti-DPS) [21][22][23][24][25][26][27]38] were performed to generate ten different prediction results. Tables 2 and 3 list the performance comparisons of ten independent runs evaluated by 10-fold CV and independent validation test, respectively.…”
Section: Prediction Performancementioning
confidence: 99%
“…Owing to the complex architecture of computational models and low interpretable features used in the study, it is not easy to identify and assess which features are beneficial for the biological activities of PVPs. As mentioned in a series of recent publications [17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33] and summarized in several comprehensive review papers [29,[34][35][36], one of the main values of bioinformatics tools should be its ability to provide insight into mechanisms of action under study. Secondly, few existing methods were not assessed using an independent dataset, indicating that these methods might provide misleading results with overestimated accuracy.…”
Section: Introductionmentioning
confidence: 99%
“…For the machine/deep learning techniques, several sequence-based methods have been developed for protein solubility prediction including PROSO II (Smialowski, et al, 2012), CCSOL (Agostini, et al, 2012), SOLpro (Magnan, et al, 2009), and the scoring card method (SCM) (Huang, et al, 2012). The majority of these methods adopted the support vector machine(SVM) (AK, 2002) as the core discriminative model on biologically relevant handcrafted features from protein sequences to discriminate the soluble and insoluble proteins.…”
Section: Introductionmentioning
confidence: 99%