Decreased cost of human exome and genome sequencing provides new opportunities for diagnosing genetic disorders, but we need better and more robust methods for interpreting sequencing results including determining whether and by which mechanism a specific missense variants may be pathogenic. Using the protein PTEN (phosphatase and tensin homolog) as an example, we show how recent developments in both experiments and computational modelling can be used to determine whether a missense variant is likely to be pathogenic. One approach relies on multiplexed experiments that enable determination of the effect of all possible individual missense variants in a cellular assay. Another approach is to use computational methods to predict variant effects. We compare two different multiplexed experiments and two computational methods to classify variant effects in PTEN. We distinguish between methods that focus on effects on protein stability and proteinspecific methods that are more directly related to enzyme activity. Our results on PTEN suggest that ~60% of pathogenic variants cause loss of function because they destabilise the folded protein which is subsequently degraded. Methods that quantify a broader range of effects on PTEN activity perform better at predicting variant effects. Either experimental method performs better than the corresponding computational predictions.
Calculating changes in protein stability (ΔΔG) has been shown to be central for predicting the consequences of single amino acid substitutions in protein engineering as well as interpretation of genomic variants for disease risk. Structure-based calculations are considered most accurate, however the tools used to calculate ΔΔGs have been developed on experimentally resolved structures. Extending those calculations to homology models based on related proteins would greatly extend their applicability as large parts of e.g. the human proteome are not structurally resolved. In this study we aim to investigate the accuracy of ΔΔG values predicted on homology models compared to crystal structures. Specifically, we identified four proteins with a large number of experimentally tested ΔΔGs and templates for homology modeling across a broad range of sequence identities, and selected three methods for ΔΔG calculations to test. We find that ΔΔG-values predicted from homology models compare equally well to experimental ΔΔGs as those predicted on experimentally established crystal structures, as long as the sequence identity of the model template to the target protein is at least 40%. In particular, the Rosetta cartesian_ddg protocol is robust against the small perturbations in the structure which homology modeling introduces. In an independent assessment, we observe a similar trend when using ΔΔGs to categorize variants as low or wild-type-like abundance. Overall, our results show that stability calculations performed on homology models can substitute for those on crystal structures with acceptable accuracy as long as the model is built on a template with sequence identity of at least 40% to the target protein.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.