Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition

Huang, Hui Ling; Charoenkwan, Phasit; Kao, Te Fen; Lee, Hua Chin; Chang, Fang Lin; Huang, Wen; Ho, Shinn‐Ying; Shu, Li; Chen, Wenliang; Ho, Shinn-Ying

doi:10.1186/1471-2105-13-s17-s3

Cited by 60 publications

(72 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Higher helix propensity has been reported to increase solubility (Idicula-Thomas and Balaji 2005; Huang et al 2012) . However, our analysis has shown that helical and turn propensities anti-correlate with solubility, whereas sheet propensity lacks correlation with solubility, suggesting that disordered regions may tend to be more soluble (Fig 3).…”

Section: Discussionmentioning

confidence: 99%

Solubility-Weighted Index: fast and accurate prediction of protein solubility

Bhandari

Geeleher

Lim

2020

Preprint

View full text Add to dashboard Cite

Motivation:Recombinant protein production is a widely used technique in the biotechnology and biomedical industries, yet only a quarter of target proteins are soluble and can therefore be purified. Results:We have discovered that global structural flexibility, which can be modeled by normalised B-factors, accurately predicts the solubility of 12,216 recombinant proteins expressed in Escherichia coli . We have optimised B-factors, and derived a new set of values for solubility scoring that further improves prediction accuracy. We call this new predictor the 'Solubility-Weighted Index' (SWI). Importantly, SWI outperforms many existing protein solubility prediction tools. Furthermore, we have developed 'SoDoPE' (Soluble Domain for Protein Expression), a web interface that allows users to choose a protein region of interest for predicting and maximising both protein expression and solubility. AvailabilityThe SoDoPE web server and source code are freely available at https://tisigner.com/sodope and https://github.com/Gardner-BinfLab/TISIGNER-ReactJS , respectively.

show abstract

Section: Discussionmentioning

confidence: 99%

Solubility-Weighted Index: fast and accurate prediction of protein solubility

Bhandari

Geeleher

Lim

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…To make a fair comparison with the existing PVP predictors [9,10,[12][13][14], the same benchmark and independent datasets that have been used in previous studies [12] were used to develop our proposed model. Due to the non-deterministic characteristic of the GA algorithm [26,32], ten SCM models in conjunction with ten different optimized dipeptide propensity scores (opti-DPS) [21][22][23][24][25][26][27]38] were performed to generate ten different prediction results. Tables 2 and 3 list the performance comparisons of ten independent runs evaluated by 10-fold CV and independent validation test, respectively.…”

Section: Prediction Performancementioning

confidence: 99%

“…Owing to the complex architecture of computational models and low interpretable features used in the study, it is not easy to identify and assess which features are beneficial for the biological activities of PVPs. As mentioned in a series of recent publications [17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33] and summarized in several comprehensive review papers [29,[34][35][36], one of the main values of bioinformatics tools should be its ability to provide insight into mechanisms of action under study. Secondly, few existing methods were not assessed using an independent dataset, indicating that these methods might provide misleading results with overestimated accuracy.…”

Section: Introductionmentioning

confidence: 99%

PVPred-SCM: Improved Prediction and Analysis of Phage Virion Proteins Using a Scoring Card Method

Charoenkwan

Kanthawong

Schaduangrat

et al. 2020

Cells

Self Cite

View full text Add to dashboard Cite

Although, existing methods have been successful in predicting phage (or bacteriophage) virion proteins (PVPs) using various types of protein features and complex classifiers, such as support vector machine and naïve Bayes, these two methods do not allow interpretability. However, the characterization and analysis of PVPs might be of great significance to understanding the molecular mechanisms of bacteriophage genetics and the development of antibacterial drugs. Hence, we herein proposed a novel method (PVPred-SCM) based on the scoring card method (SCM) in conjunction with dipeptide composition to identify and characterize PVPs. In PVPred-SCM, the propensity scores of 400 dipeptides were calculated using the statistical discrimination approach. Rigorous independent validation test showed that PVPred-SCM utilizing only dipeptide composition yielded an accuracy of 77.56%, indicating that PVPred-SCM performed well relative to the state-of-the-art method utilizing a number of protein features. Furthermore, the propensity scores of dipeptides were used to provide insights into the biochemical and biophysical properties of PVPs. Upon comparison, it was found that PVPred-SCM was superior to the existing methods considering its simplicity, interpretability, and implementation. Finally, in an effort to facilitate high-throughput prediction of PVPs, we provided a user-friendly web-server for identifying the likelihood of whether or not these sequences are PVPs. It is anticipated that PVPred-SCM will become a useful tool or at least a complementary existing method for predicting and analyzing PVPs.

show abstract

“…For the machine/deep learning techniques, several sequence-based methods have been developed for protein solubility prediction including PROSO II (Smialowski, et al, 2012), CCSOL (Agostini, et al, 2012), SOLpro (Magnan, et al, 2009), and the scoring card method (SCM) (Huang, et al, 2012). The majority of these methods adopted the support vector machine(SVM) (AK, 2002) as the core discriminative model on biologically relevant handcrafted features from protein sequences to discriminate the soluble and insoluble proteins.…”

Section: Introductionmentioning

confidence: 99%

Structure-aware Protein Solubility Prediction From Sequence Through Graph Convolutional Network And Predicted Contact Map

Chen

Zheng

Zhao

et al. 2020

Preprint

View full text Add to dashboard Cite

Motivation: Protein solubility is significant in producing new soluble proteins that can reduce the cost of biocatalysts or therapeutic agents. Therefore, a computational model is highly desired to accurately predict protein solubility from the amino acid sequence. Many methods have been developed, but they are mostly based on the one-dimensional embedding of amino acids that is limited to catch spatially structural information. Results:In this study, we have developed a new structure-aware method to predict protein solubility by attentive graph convolutional network (GCN), where the protein topology attribute graph was constructed through predicted contact maps from the sequence. GraphSol was shown to substantially outperform other sequence-based methods. The model was proven to be stable by consistent R 2 of 0.48 in both the cross-validation and independent test of the eSOL dataset. To our best knowledge, this is the first study to utilize the GCN for sequence-based predictions. More importantly, this architecture could be extended to other protein prediction tasks. Availability: The package is available at http://biomed.nscc-gz.cn Contact:

show abstract

Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition

Cited by 60 publications

References 30 publications

Solubility-Weighted Index: fast and accurate prediction of protein solubility

Solubility-Weighted Index: fast and accurate prediction of protein solubility

PVPred-SCM: Improved Prediction and Analysis of Phage Virion Proteins Using a Scoring Card Method

Structure-aware Protein Solubility Prediction From Sequence Through Graph Convolutional Network And Predicted Contact Map

Contact Info

Product

Resources

About