SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles

Faraggi, Eshel; Zhang, Tuo; Yang, Yuedong; Kurgan, Lukasz; Zhou, Yaoqi

doi:10.1002/jcc.21968

Cited by 227 publications

(226 citation statements)

References 45 publications

Supporting

Mentioning

218

Contrasting

Unclassified

Order By: Relevance

“…Yu et al (2017) use Chous pseudo amino acid composition and wavelet denoising to prediction structural class. From 2014 to now, several papers (Dehzangi et al, 2014;Wang et al, 2014;Jones, 1999;Faraggi et al, 2012) show that the protein secondary structure is significanc to predict protein structural classes. Firstly the features are extracted, secondly all kinds of algorithms can be used to implement the classification prediction, such as Fisher's linear discriminant algorithm (Yang et al, 2009), Support Vector Machine (SVM) (Cai et al, 2003) and so on.…”

Section: Introductionmentioning

confidence: 99%

Low-Homology Protein Structural Class Prediction from Secondary Structure Based on Visibility and Horizontal Visibility Network

Zhao¹,

Luo²,

Liu³

2018

American Journal of Biochemistry and Biotechnology

View full text Add to dashboard Cite

In this study, based on the predicted secondary structures of proteins, we propose a new approach to predict protein structural classes (α,β,α/β,α+β) for three widely used low-homology data sets. Fist, we obtain two time siries from the chaos game representation of each predicted secondary structure; second, based on two time series, we construct visibility and horizontal visibility network, respectively and generate a set of features using 17 network features; finaly, we predict each protein structure class using support vector machine and Fisher's linear discriminant algorithm, respectively. In order to evaluate our method, the leave one out cross-validating test is employed on three data sets. Results show that our approach has been provided as a effective tool for the prediction of low-homology protein structural classes.

show abstract

Section: Introductionmentioning

confidence: 99%

Low-Homology Protein Structural Class Prediction from Secondary Structure Based on Visibility and Horizontal Visibility Network

Zhao¹,

Luo²,

Liu³

2018

American Journal of Biochemistry and Biotechnology

View full text Add to dashboard Cite

show abstract

“…We also use predicted secondary structure using SPINE-X which was recently proposed by [46] and attained better results (especially for the coded area) than PSIPRED on predicting protein secondary structure [47]. Given a protein sequence, it returns an L Â 3 matrix (which will be referred to as SPINE-M for the rest of this study) consisting of the normalized probability of contribution of a given amino acid based on its position along the protein sequence to build one of the three secondary structure elements namely, a-helix, b-strands, and coils.…”

Section: Feature Extraction Methodsmentioning

confidence: 99%

“…In this study, we will refer to this sequence as the structural consensus sequence. It is expected that predicted secondary structure using SPINE-X provides significant structural information for the PFR similar to or even better than PSIPRED due to its better performance [17], [23], [30], [46].…”

Section: Feature Extraction Methodsmentioning

confidence: 99%

“…As it is highlighted in [49], the most sensitive methods for fold recognition use sequence profiles to represent both the query and the data base proteins. The robustness and sensitivity of PSSM and SPINE-X for feature extraction have been addressed in [23], [46], [49]. In continuation, the global and local features extracted in this study will be explained in detail.…”

Section: Feature Extraction Methodsmentioning

confidence: 99%

See 1 more Smart Citation

A Segmentation-Based Method to Extract Structural and Evolutionary Features for Protein Fold Recognition

Dehzangi

Paliwal

Lyons

et al. 2014

IEEE/ACM Trans. Comput. Biol. and Bioinf.

View full text Add to dashboard Cite

Abstract-Protein fold recognition (PFR) is considered as an important step towards the protein structure prediction problem. Despite all the efforts that have been made so far, finding an accurate and fast computational approach to solve the PFR still remains a challenging problem for bioinformatics and computational biology. In this study, we propose the concept of segmented-based feature extraction technique to provide local evolutionary information embedded in position specific scoring matrix (PSSM) and structural information embedded in the predicted secondary structure of proteins using SPINE-X. We also employ the concept of occurrence feature to extract global discriminatory information from PSSM and SPINE-X. By applying a support vector machine (SVM) to our extracted features, we enhance the protein fold prediction accuracy for 7.4 percent over the best results reported in the literature. We also report 73.8 percent prediction accuracy for a data set consisting of proteins with less than 25 percent sequence similarity rates and 80.7 percent prediction accuracy for a data set with proteins belonging to 110 folds with less than 40 percent sequence similarity rates. We also investigate the relation between the number of folds and the number of features being used and show that the number of features should be increased to get better protein fold prediction results when the number of folds is relatively large.

show abstract

“…The residue level information includes: (a) single valued amino acid type (all the necessary information for the correct folding of a protein is encoded in its amino acid sequence [26]); (b) seven physicochemical properties of amino acid (different types, short or long, disordered regions in protein are found to have distinguished physicochemical properties); (c) twenty PSSM's (position specific scoring matrix) indicating the evolutionary information accumulated in each residue position of a protein sequence; (d) three predicted secondary structure (helix, strand and coil) probabilities from SPINE-X [27], one predicted accessible surface area (ASA) normalized by the ASA of an extended conformation (Ala-XAla) [28] and two predicted backbone torsion angle (phi, psi) fluctuations [29] since disordered residues are characterized by lack of stable secondary structure [30], highly exposed area and angle fluctuations; (e) one monogram and twenty bigrams computed from PSSM [31] representing the conserved evolutionary information of PSSM transformed from primary structure level to three dimensional structure level, which are normalized by the median of normal density distribution of monogram and bigram values in their logarithmic space; (f) one indicator for terminal residues (five residues from Nterminal as {−1.0, −0.8, −0.6, −0.4, −0.2}, five residue from C-terminal from {+1.0, +0.8, +0.6, +0.4, +0.2} respectively, with the rest as 0.0). Finally, before feeding the features into the classifier, neighboring residue's information is aggregated using a sliding window of 21 residues (10 residues on each residue to be predicted), resulting in 21 × 56 = 1176 features per residue.…”

Section: B Input Featuresmentioning

confidence: 99%

Improved protein disorder predictor by smoothing output

Iqbal¹,

Islam²,

Hoque³

2014

2014 17th International Conference on Computer and Information Technology (ICCIT)

View full text Add to dashboard Cite

Abstract-Intrinsically disorder regions (IDRs) or, proteins (IDPs) are associated with important biological functions, while lacking stable structure in their native state. The phenomena of disordered proteins or residues are abundant in nature and are extensively involved in critical human diseases and hence impacting drug discovery. Thus, the study using disorder prediction is becoming crucial in the proteomic research. The large scale growth of genome database demands high performance computational methods for identification of protein disorder. We developed a canonical support vector machine based disorder predictor, DisPredict by integrating RBF kernel. It employs novel feature set for accurate characterization of disorder which outperformed two leading predictors: the neural network based SPINE-D and Meta predictor MFDp based on ten-fold cross validation. We propose a post processing of probabilities to further improve the accuracy, named DisPredict1.1 which yields outstanding performance further both in binary annotation and real valued probability prediction per residue in both short and long disordered regions. It provides highest Mathews Correlation Coefficient (MCC), competitive Area Under receiver operating characteristic Curve (AUC) and lowest Mean Absolute Error (MAE) when compared with twenty existing predictors of several kinds on independent benchmark dataset. DisPredict is available online.

show abstract

SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles

Cited by 227 publications

References 45 publications

Low-Homology Protein Structural Class Prediction from Secondary Structure Based on Visibility and Horizontal Visibility Network

Low-Homology Protein Structural Class Prediction from Secondary Structure Based on Visibility and Horizontal Visibility Network

A Segmentation-Based Method to Extract Structural and Evolutionary Features for Protein Fold Recognition

Improved protein disorder predictor by smoothing output

Contact Info

Product

Resources

About