Rational Design of Temperature-Sensitive Alleles Using Computational Structure Prediction

Poultney, Christopher S.; Butterfoss, Glenn L.; Gutwein, Michelle; Drew, Kevin; Gresham, David; Gunsalus, Kristin C.; Shasha, Dennis; Bonneau, Richard

doi:10.1371/journal.pone.0023947

Cited by 22 publications

(29 citation statements)

References 23 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, we can utilize physicochemical data on the change of amino acids at a particular site ("residue change in charge, hydrophobicity, volume, molecular weight") as a means to approximate this. To compliment these differences in physicochemical empirical data, we utilize a qualitative residue swap similarity metric that is 0 when both the unmutated and mutated amino acids belong to the same class (small nonpolar, small polar, negative charge, large nonpolar, bad behaved, positive charge, side chain amide) and 1 otherwise, as defined by Poultney et al [58]. Additionally, a static structural feature of potentially high descriptive value is "residue mean mutual information" which is the average value in bits at a particular residue in a mutual information matrix computed using MDEntropy.…”

Section: B Feature Designmentioning

confidence: 99%

Predicting Protein Thermostability Upon Mutation Using Molecular Dynamics Timeseries Data

Fleming

Kinsella

Ing

2016

Preprint

View full text Add to dashboard Cite

A large number of human diseases result from disruptions to protein structure and function caused by missense mutations. Computational methods are frequently employed to assist in the prediction of protein stability upon mutation. These methods utilize a combination of protein sequence data, protein structure data, empirical energy functions, and physicochemical properties of amino acids. In this work, we present the first use of dynamic protein structural features in order to improve stability predictions upon mutation. This is achieved through the use of a set of timeseries extracted from microsecond timescale atomistic molecular dynamics simulations of proteins. Standard machine learning algorithms using mean, variance, and histograms of these timeseries were found to be 60-70% accurate in stability classification based on experimental G or protein-chaperone interaction measurements. A recurrent neural network with full treatment of timeseries data was found to be 80% accurate according the F1 score. The performance of our models was found to be equal or better than two recently developed machine learning methods for binary classification as well as two industry-standard stability prediction algorithms. In addition to classification, understanding the molecular basis of protein stability disruption due to disease-causing mutations is a significant challenge that impedes the development of drugs and therapies that may be used treat genetic diseases. The use of dynamic structural features allows for novel insight into the molecular basis of protein disruption by mutation in a diverse set of soluble proteins. To assist in the interpretation of machine learning results, we present a technique for determining the importance of features to a recurrent neural network using Garson's method. We propose a novel extension of neural interpretation diagrams by implementing Garson's method to scale each node in the neural interpretation diagram according to its relative importance to the network.

show abstract

Section: B Feature Designmentioning

confidence: 99%

Predicting Protein Thermostability Upon Mutation Using Molecular Dynamics Timeseries Data

Fleming

Kinsella

Ing

2016

Preprint

View full text Add to dashboard Cite

show abstract

“…Both protocols 1) substitute the native residue for the variant amino acid, 2) refine the variant structure, including protein backbone movements, to accommodate this change, and 3) compare the output structures using the Rosetta score terms (Figure 1, Supplementary Figure S2). To generate features for each variant, we follow Poultney et al (40) and normalize structure-based features by comparing scores for a given variant to scores derived from Rosetta-relaxed ensembles of its native protein. We also include the accessible surface area at the position of variation as a feature, calculated using PROBE (52).…”

Section: Sequence-based Features From Blast Analysismentioning

confidence: 99%

“…We include additional features describing the geometric differences between the input and final structure for each trajectory (e.g., RMSD and gdtmm) to detect proteins undergoing large rearrangements, totaling 23 features. To compare the native and variant ensembles and eliminate potential differences in score magnitude across diverse protein folds, we 1) extract the distributions of each Rosetta score term for the native and variant proteins, 2) calculate the quartiles of the variant protein score distributions, and 3) calculate the cumulative density for these quantiles on the corresponding native protein score distribution (40). FastRelax and quartile analysis produce three features per score term for each variant, corresponding to the Q1, Q2, and Q3 quartiles (40), totaling 60 features.…”

Section: Structure-based Features From Rosetta Analysismentioning

confidence: 99%

“…Each of these methods predicts a label that is designed to correlate with variant deleteriousness and is used to prioritize causal pathogenic variants from large genomic datasets (10). Deleteriousness can be approximated with measures of conservation and molecular functionality but available data on both protein sequence variation and structural energetics are rarely combined (6,40,43). Selection against deleterious variants can be detected by analysis of conservation and other alignment-based methods, although these metrics may not apply to de novo mutations.…”

Section: Introductionmentioning

confidence: 99%

“…We obtained structural models for these proteins from solved crystal structures and comparative modeling initiatives, such as ModBase (39), taking advantage of reliable homology models freely available for most human proteins. Structural analysis is performed using Rosetta to rigorously sample variant protein conformations, properly accommodating the variant amino acid by moving the protein backbone (20,40,49). We combine sequence-based and structure-based features in a sparse logistic regression framework, leading to a classifier that accurately ranks deleterious variants, with ≥90% precision on the highest scoring 3,800 variants (40% of variants classified) and 0.872 Area Under the Precision-Recall curve (AUPR).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Robust Classification of Protein Variation Using Structural Modeling and Large-Scale Data Integration

Baugh

Simmons-Edler

Mueller

et al. 2015

Preprint

Self Cite

View full text Add to dashboard Cite

Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modeling (using the Rosetta protein modeling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9,477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to mutations in any organism's proteome with improved generalized accuracy (AUROC .83) and interpretability (AUPR .87) compared to other methods. We demonstrate that VIPUR's predictions of deleteriousness match the biological phenotypes in ClinVar and provide a clear ranking of prediction confidence. We use VIPUR to interpret known mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation of mutations associated with human diseases. Lastly we demonstrate VIPUR's ability to highlight candidate genes associated with human diseases by applying VIPUR to de novo variants associated with autism spectrum disorders.

show abstract

Application of CRISPR–Cas Technology in Drug Development

Altaf,

Saleem,

Ikram

et al. 2024

Trends in Plant Biotechnology

View full text Add to dashboard Cite

Rational Design of Temperature-Sensitive Alleles Using Computational Structure Prediction

Cited by 22 publications

References 23 publications

Predicting Protein Thermostability Upon Mutation Using Molecular Dynamics Timeseries Data

Predicting Protein Thermostability Upon Mutation Using Molecular Dynamics Timeseries Data

Robust Classification of Protein Variation Using Structural Modeling and Large-Scale Data Integration

Application of CRISPR–Cas Technology in Drug Development

Contact Info

Product

Resources

About