On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction

Becker, Julien; Maes, Francis; Wehenkel, Louis

doi:10.1371/journal.pone.0056621

Cited by 8 publications

(17 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Indeed, contrarily to the observation made in [1] that suggested a very small number of relevant feature functions in the context of disulfide bridge prediction, the selection algorithm identified here a larger set of interesting feature functions.…”

Section: Resultsmentioning

confidence: 56%

“…For this purpose, we consider various feature encodings and, in addition to the primary structure, three in-sillico annotations: position-specific scoring matrices (PSSM), predicted secondary structures and predicted solvent accessibilities. We apply the feature function selection pipeline in combination with Extremely randomized Trees (ETs), a model which gave excellent results in previous work [1]. In order to avoid any risk of overfitting or over-estimation of our models, we use three distinct datasets: Disorder723 [19], Casp10 (http://www.predictioncenter.org/casp10/) and Pdb30.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On the Encoding of Proteins for Disordered Regions Prediction

2013

Self Cite

View full text Add to dashboard Cite

Disordered regions, i.e., regions of proteins that do not adopt a stable three-dimensional structure, have been shown to play various and critical roles in many biological processes. Predicting and understanding their formation is therefore a key sub-problem of protein structure and function inference. A wide range of machine learning approaches have been developed to automatically predict disordered regions of proteins. One key factor of the success of these methods is the way in which protein information is encoded into features. Recently, we have proposed a systematic methodology to study the relevance of various feature encodings in the context of disulfide connectivity pattern prediction. In the present paper, we adapt this methodology to the problem of predicting disordered regions and assess it on proteins from the 10th CASP competition, as well as on a very large subset of proteins extracted from PDB. Our results, obtained with ensembles of extremely randomized trees, highlight a novel feature function encoding the proximity of residues according to their accessibility to the solvent, which is playing the second most important role in the prediction of disordered regions, just after evolutionary information. Furthermore, even though our approach treats each residue independently, our results are very competitive in terms of accuracy with respect to the state-of-the-art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3Disorder.

show abstract

Section: Resultsmentioning

confidence: 56%

Section: Introductionmentioning

confidence: 99%

On the Encoding of Proteins for Disordered Regions Prediction

2013

Self Cite

View full text Add to dashboard Cite

show abstract

“…The method proposed by Becker et al 6 employs three different classification algorithms for the prediction of disulfide bonding probabilities: k-nearest neighbors, SVMs, and extremely randomized trees. Therefore, they propose a feature function selection, which determines a subset of feature functions and the best setting for associated window sizes.…”

Section: Methodsmentioning

confidence: 99%

“…Sequence alignment is a standard technique in bioinformatics for visualizing the relationships between residues in a collection of evolutionary or structurally related proteins. Existing DCP algorithms in the literature have used multiple sequence alignment, position-specific scoring matrices (PSSMs) 6 , 7 and correlated mutations 4 as input encoding.…”

Section: Preliminary Conceptsmentioning

confidence: 99%

Soft Computing Methods for Disulfide Connectivity Prediction

Chamorro

Aguilar–Ruiz

2015

Evol Bioinform Online

View full text Add to dashboard Cite

The problem of protein structure prediction (PSP) is one of the main challenges in structural bioinformatics. To tackle this problem, PSP can be divided into several subproblems. One of these subproblems is the prediction of disulfide bonds. The disulfide connectivity prediction problem consists in identifying which nonadjacent cysteines would be cross-linked from all possible candidates. Determining the disulfide bond connectivity between the cysteines of a protein is desirable as a previous step of the 3D PSP, as the protein conformational search space is highly reduced. The most representative soft computing approaches for the disulfide bonds connectivity prediction problem of the last decade are summarized in this paper. Certain aspects, such as the different methodologies based on soft computing approaches (artificial neural network or support vector machine) or features of the algorithms, are used for the classification of these methods.

show abstract

“…As the position specific scoring matrix encodes the evolutionary information of a protein, the feature derived from PSSM has been widely and successfully applied to disulfide connectivity predictions [23], [34], [43]. In this study, we extract the PSSM feature as follows: the original PSSM of a given protein sequence is obtained by executing PSI-BLAST [44] to search the Swiss-Prot database through three iterations with a default E-value cutoff; then, we transform the original PSSM to a normalized one by applying the logistic function fðxÞ ¼ 1= 1 þ e Àx ð Þto each element x contained in the original PSSM; finally, each cysteine residue is encoded into a 13 Â 20 ¼ 260-D feature vector that consists of the normalized PSSM elements corresponding to a sequence segment of length 13 centered on the cysteine residue [23], [34].…”

Section: Position Specific Scoring Matrix Featurementioning

confidence: 99%

Disulfide Connectivity Prediction Based on Modelled Protein 3D Structural Information and Random Forest Regression

et al. 2015

IEEE/ACM Trans. Comput. Biol. and Bioinf.

View full text Add to dashboard Cite

Disulfide connectivity is an important protein structural characteristic. Accurately predicting disulfide connectivity solely from protein sequence helps to improve the intrinsic understanding of protein structure and function, especially in the post-genome era where large volume of sequenced proteins without being functional annotated is quickly accumulated. In this study, a new feature extracted from the predicted protein 3D structural information is proposed and integrated with traditional features to form discriminative features. Based on the extracted features, a random forest regression model is performed to predict protein disulfide connectivity. We compare the proposed method with popular existing predictors by performing both cross-validation and independent validation tests on benchmark datasets. The experimental results demonstrate the superiority of the proposed method over existing predictors. We believe the superiority of the proposed method benefits from both the good discriminative capability of the newly developed features and the powerful modelling capability of the random forest. The web server implementation, called TargetDisulfide, and the benchmark datasets are freely available at: http://csbio.njust.edu.cn/bioinf/TargetDisulfide for academic use.

show abstract

On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction

Cited by 8 publications

References 41 publications

On the Encoding of Proteins for Disordered Regions Prediction

On the Encoding of Proteins for Disordered Regions Prediction

Soft Computing Methods for Disulfide Connectivity Prediction

Disulfide Connectivity Prediction Based on Modelled Protein 3D Structural Information and Random Forest Regression

Contact Info

Product

Resources

About