Context-Based Features Enhance Protein Secondary Structure Prediction Accuracy

Yaseen, Ashraf; Li, Yaohang

doi:10.1021/ci400647u

Cited by 59 publications

(61 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this work, similar to our previous work employed in DINOSOLVE [29], SCORPION [30], and CASA, we collect statistics of singlets (), doublets (), and triplets () residues at different positions in protein chains in a window of size 7 residues (). These statistics represent approximations of the possibilities of residues adopting certain flexibility states when none, one, or two neighboring residues are considered.…”

Section: Methodsmentioning

confidence: 99%

“…We use the methods SCORPION [30] and CASA for secondary structure and solvent accessibility predictions, respectively.…”

Section: Methodsmentioning

confidence: 99%

“…We describe the approaches of extracting statistical scores to measure the favorability of residues’ flexibility in presence of its surrounding neighbors in sequence from a large training dataset based on the mean-field potentials [28]. These approaches were successfully applied in our previous work for predicting protein disulfide bonding [29], secondary structures [30, 31], and solvent accessibility. The basic idea is based on the observation that residues’ flexibility exhibit strong local dependency.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

FLEXc: protein flexibility prediction using context-based statistics, predicted structural features, and sequence information

et al. 2016

Self Cite

View full text Add to dashboard Cite

BackgroundThe fluctuation of atoms around their average positions in protein structures provides important information regarding protein dynamics. This flexibility of protein structures is associated with various biological processes. Predicting flexibility of residues from protein sequences is significant for analyzing the dynamic properties of proteins which will be helpful in predicting their functions.ResultsIn this paper, an approach of improving the accuracy of protein flexibility prediction is introduced. A neural network method for predicting flexibility in 3 states is implemented. The method incorporates sequence and evolutionary information, context-based scores, predicted secondary structures and solvent accessibility, and amino acid properties. Context-based statistical scores are derived, using the mean-field potentials approach, for describing the different preferences of protein residues in flexibility states taking into consideration their amino acid context.The 7-fold cross validated accuracy reached 61 % when context-based scores and predicted structural states are incorporated in the training process of the flexibility predictor.ConclusionsIncorporating context-based statistical scores with predicted structural states are important features to improve the performance of predicting protein flexibility, as shown by our computational results. Our prediction method is implemented as web service called “FLEXc” and available online at: http://hpcr.cs.odu.edu/flexc.

show abstract

Section: Methodsmentioning

confidence: 99%

“…We use the methods SCORPION [30] and CASA for secondary structure and solvent accessibility predictions, respectively.…”

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

FLEXc: protein flexibility prediction using context-based statistics, predicted structural features, and sequence information

et al. 2016

Self Cite

View full text Add to dashboard Cite

show abstract

“…Computational methods for PSSP, mostly based on machine learning methods, can be schematically grouped in the following three families: sequence- based methods; network-based methods [9]; hierarchical ensemble methods [10]. Some methods provided predictions of a relatively small set of functional classes [11], while others considered predictions extended to larger sets, using Support Vector machines [12] , HMM Algorithm, artificial neural networks, Bayesian networks, Decision Trees or methods that combine functional linkage networks with learning machines using a logistic regression model or simple algebraic operators.…”

Section: Structure Prediction: Concepts and Techniquesmentioning

confidence: 99%

A Distributed Tree-based Ensemble Learning Approach for Efficient Structure Prediction of Protein

Xavier¹,

Ramkumar²

2017

IJIES

View full text Add to dashboard Cite

Knowledge of a protein's secondary structure, in turn, contributes to our understanding of the functions of the protein is vital to many aspects of living organisms such as those of enzymes, hormones, and structural material, etc. It also helps in designing new drugs for critical disease. In this paper, we have advocated a distributed approach to identify the Protein Secondary Structures using an ensemble method on protein primary sequences. The Ensemble based Random Forest algorithm has been adopted to build the three-way predictive model. Based on the amino acid features of each protein and decision tree parameters, the classification model allows us to assign protein structures as 'α helix', 'β sheet', or a coil. Also the proposed model is implemented in a distributed computing environment, SPARK. Experiments have been carried out using cross-validation tests on RS126 and CB513 benchmark datasets. Our results clearly confirm that ensemble approach in classifying protein secondary structures scores better accuracy with improved performance when it will be implemented in the distributed environment.

show abstract

“…Sometimes it is essential to know protein 3D structures to identify the protein functions at a molecular level. Reliably and accurately predicting protein 3D structure form sequences of proteins is one of the most challenging issues in computational biology [1]. Protein secondary structure prediction is a vital step towards to predict protein tertiary (3D) structure [2], prediction of protein disorder [3], and solvent accessibility prediction [4].…”

Section: Introductionmentioning

confidence: 99%

Prediction of 8-state protein secondary structures by 1D-Inception and BD-LSTM

Ratul

Turcotte

Mozaffari

et al. 2019

Preprint

View full text Add to dashboard Cite

Protein secondary structure is crucial to create an information bridge between the primary structure and the tertiary (3D) structure. Precise prediction of 8-state protein secondary structure (PSS) significantly utilized in the structural and functional analysis of proteins in bioinformatics. In this recent period, deep learning techniques have been applied in this research area and raise the Q8 accuracy remarkably. Nevertheless, from a theoretical standpoint, there still lots of room for improvement, specifically in 8-state (Q8) protein secondary structure prediction. In this paper, we presented two deep learning architecture, namely 1D-Inception and BD-LSTM, to improve the performance of 8-classes PSS prediction. The input of these two architectures is a carefully constructed feature matrix from the sequence features and profile features of the proteins. Firstly, 1D-Inception is a Deep convolutional neural network-based approach that was inspired by the InceptionV3 model and containing three inception modules. Secondly, BD-LSTM is a recurrent neural network model which including bidirectional LSTM layers. Our proposed 1D-Inception method achieved 76.65%, 71.18%, 76.86%, and 74.07% Q8 accuracy respectively on benchmark CullPdb6133, CB513, CASP10, and CASP11 datasets. Moreover, BD-LSTM acquired 74.71%, 69.49%, 74.07%, and 72.37% state-8 accuracy after evaluated on CullPdb6133, CB513, CASP10, and CASP11 datasets, respectively. Both these architectures enable the efficient processing of local and global interdependencies between amino acids to make an accurate prediction of each class is very beneficial in the deep neural network. To the best of our knowledge, experiment results of the 1D-Inception model demonstrate that it outperformed all the state-of-art methods on the benchmark CullPdb6133, CB513, and CASP10 datasets. Datasets and Methodology DatasetsHere, we utilize five different datasets, namely, CullPdb 6133, CullPdb 6133 filtered, Cb513, Casp10, and Casp11. Among these five datasets CullPdb 6133, and CullPdb 6133 filtered for training. Furthermore, CB5133, Casp10, Casp11, and 272 protein sequence of CullPdb 6133 for testing. CullPdb 6133: CullPdb 6133 [51] dataset is a non-homologous protein dataset that is provided by PISCES CullPDB with the familiar secondary structure for protein. This dataset contains a total of 6128 protein sequences, in which 5600 ([0:5600]) protein samples are considered as the training set, 272 protein samples [5605:5877] for testing, and 256 proteins samples ([5877,6133]) regarded as the validation set. Moreover, CullPdb 6133 (non-filtered) dataset has 57 features, such as amino acid residues (features [0:22)), N-and C-terminals (features [31,33)), relative and absolute solvent accessibility ([33,35)), and features of sequence profiles (features [35:57)). We used secondary structure notation (features [22:31)) for labeling. This CullPdb dataset is publicly obtainable from [2].

show abstract

Context-Based Features Enhance Protein Secondary Structure Prediction Accuracy

Cited by 59 publications

References 39 publications

FLEXc: protein flexibility prediction using context-based statistics, predicted structural features, and sequence information

FLEXc: protein flexibility prediction using context-based statistics, predicted structural features, and sequence information

A Distributed Tree-based Ensemble Learning Approach for Efficient Structure Prediction of Protein

Prediction of 8-state protein secondary structures by 1D-Inception and BD-LSTM

Contact Info

Product

Resources

About