Machine Learning: How Much Does It Tell about Protein Folding Rates?

Corrales, Marc; Cuscó, Pol; Usmanova, Dinara R.; Chen, Heng‐Chang; Bogatyreva, Natalya S.; Filion, Guillaume J.; Ivankov, Dmitry N.

doi:10.1371/journal.pone.0143166

Cited by 18 publications

(18 citation statements)

References 36 publications

(66 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…To summarize, the theoretical and semi-empirical methods (that use such meaningful parameters as the chain length [51,69,70], protein globule cross-section [71], α-helical content [72], locality of contacts [73], contact order [74,75], etc., but do not use or use a very small number of adjustable parameters) show better predictive power and correlation with experiment than the current machine learning techniques that use too many adjustable parameters (provided that correlations are obtained on testing and not training sets) [99]. Given the still relatively low number of experimental points, the purely statistical and machine learning techniques can be currently useful only for fine-tuning small second-order corrections to the existing rough but physically or biologically meaningful estimates, or for finding relatively small corrections for parameters already known to play a physically or biologically meaningful role in folding [83].…”

Section: Discussionmentioning

confidence: 99%

“…To study this phenomenon, which, in principle, can lead to drastically worse results obtained for the testing sets than those reported for the training sets, Corrales et al checked three machine learning methods by applying them to new data, data not used when building the models [99]. It turned out that for all three considered machine learning methods the obtained correlations were significantly worse than those declared in the original publications.…”

Section: Refinement Of Existing Estimates Of Protein Folding Timesmentioning

confidence: 98%

“…To illustrate the problem of the low amount of available data, Corrales et al built the model having 19 adjustable parameters based on the amino acid occurrences in proteins having different folding times [99]:…”

Section: Refinement Of Existing Estimates Of Protein Folding Timesmentioning

confidence: 99%

“…The same for the above described (see Equation (2) and Figures 2 and 3) model, where t = τ × exp[L 2/3 ], that has no adjustable parameters (right). Adapted from [99] with minor modifications.…”

Section: Refinement Of Existing Estimates Of Protein Folding Timesmentioning

confidence: 99%

See 3 more Smart Citations

Solution of Levinthal’s Paradox and a Physical Theory of Protein Folding Times

Ivankov

Finkelstein

2020

Biomolecules

Self Cite

View full text Add to dashboard Cite

“How do proteins fold?” Researchers have been studying different aspects of this question for more than 50 years. The most conceptual aspect of the problem is how protein can find the global free energy minimum in a biologically reasonable time, without exhaustive enumeration of all possible conformations, the so-called “Levinthal’s paradox.” Less conceptual but still critical are aspects about factors defining folding times of particular proteins and about perspectives of machine learning for their prediction. We will discuss in this review the key ideas and discoveries leading to the current understanding of folding kinetics, including the solution of Levinthal’s paradox, as well as the current state of the art in the prediction of protein folding times.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Refinement Of Existing Estimates Of Protein Folding Timesmentioning

confidence: 98%

Section: Refinement Of Existing Estimates Of Protein Folding Timesmentioning

confidence: 99%

Section: Refinement Of Existing Estimates Of Protein Folding Timesmentioning

confidence: 99%

See 2 more Smart Citations

Solution of Levinthal’s Paradox and a Physical Theory of Protein Folding Times

Ivankov

Finkelstein

2020

Biomolecules

Self Cite

View full text Add to dashboard Cite

show abstract

“…2ABD and 1ST7 are respectively, the bovine and yeast structures of the extensively studied four helix bundle ACBP (acyl-coenzyme A-binding protein), whose folding pathway could challenge our infeasible SSU restriction (section 2.5). ACBP is also interesting because, depending on experimental conditions, it can be classified as a multi-state folder [15,33]. 1QYS is a de novo protein with < 100 residues but exhibits non-cooperative behavior, and has a very stable intermediate structure [38].…”

Section: Protein Datasetsmentioning

confidence: 99%

Folding with a protein's native shortcut network

Khor

2018

Proteins

View full text Add to dashboard Cite

A complex network approach to protein folding is proposed, wherein a protein's contact map is reconceptualized as a network of shortcut edges, and folding is steered by a structural characteristic of this network. Shortcut networks are generated by a known message passing algorithm operating on protein residue networks. It is found that the shortcut networks of native structures (SCN0s) are relevant graph objects with which to study protein folding at a formal level. The logarithm form of their contact order (SCN0_lnCO) correlates significantly with folding rate of two-state and nontwo-state proteins. The clustering coefficient of SCN0s (C ) correlates significantly with folding rate, transition-state placement and stability of two-state folders. Reasonable folding pathways for several model proteins are produced when C is used to combine protein segments incrementally to form the native structure. The folding bias captured by C is detectable in non-native structures, as evidenced by Molecular Dynamics simulation generated configurations for the fast folding Villin-headpiece peptide. These results support the use of shortcut networks to investigate the role protein geometry plays in the folding of both small and large globular proteins, and have implications for the design of multibody interaction schemes in folding models. One facet of this geometry is the set of native shortcut triangles, whose attributes are found to be well-suited to identify dehydrated intraprotein areas in tight turns, or at the interface of different secondary structure elements.

show abstract

Circuit topology predicts pathogenicity of missense mutations

2022

View full text Add to dashboard Cite

The contact topology of a protein determines important aspects of the folding process. The topological measure of contact order has been shown to be predictive of the rate of folding. Circuit topology is emerging as another fundamental descriptor of biomolecular structure, with predicted effects on the folding rate. We analyze the residue-based circuit topological environments of 21 K mutations labeled as pathogenic or benign. Multiple statistical lines of reasoning support the conclusion that the number of contacts in two specific circuit topological arrangements, namely inverse parallel and cross relations, with contacts involving the mutated residue have discriminatory value in determining the pathogenicity of human variants. We investigate how results vary with residue type and according to whether the gene is essential.We further explore the relationship to a number of structural features and find that circuit topology provides nonredundant information on protein structures and pathogenicity of mutations. Results may have implications for the polymer physics of protein folding and suggest that "local" topological information, including residue-based circuit topology and residue contact order, could be useful in improving state-of-theart machine learning algorithms for pathogenicity prediction.

show abstract

Machine Learning: How Much Does It Tell about Protein Folding Rates?

Cited by 18 publications

References 36 publications

Solution of Levinthal’s Paradox and a Physical Theory of Protein Folding Times

Solution of Levinthal’s Paradox and a Physical Theory of Protein Folding Times

Folding with a protein's native shortcut network

Circuit topology predicts pathogenicity of missense mutations

Contact Info

Product

Resources

About