2016
DOI: 10.1093/nar/gkw120
|View full text |Cite
|
Sign up to set email alerts
|

Robust classification of protein variation using structural modelling and large-scale data integration

Abstract: Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modelling (using the Rosetta protein modelling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9477 protein variants with known effects on… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
67
0
1

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 55 publications
(70 citation statements)
references
References 49 publications
2
67
0
1
Order By: Relevance
“…As demonstrated here and in other works (Adzhubei et al., ; Baugh et al., ; Carter et al., ; Folkman et al., ; Hecht et al., ; Redler et al., ; Yue et al., ), analysis based on protein structure provides an orthogonal approach that, in spite of its own accuracy limitations, can sometimes provide valuable insight into the atomic level mechanisms in play. In particular, as with other monogenic disease‐related mutations (Yue et al., ), for NAGLU, structure analysis shows a large fraction operate by destabilizing protein three‐dimensional structure.…”
Section: Discussionsupporting
confidence: 65%
See 1 more Smart Citation
“…As demonstrated here and in other works (Adzhubei et al., ; Baugh et al., ; Carter et al., ; Folkman et al., ; Hecht et al., ; Redler et al., ; Yue et al., ), analysis based on protein structure provides an orthogonal approach that, in spite of its own accuracy limitations, can sometimes provide valuable insight into the atomic level mechanisms in play. In particular, as with other monogenic disease‐related mutations (Yue et al., ), for NAGLU, structure analysis shows a large fraction operate by destabilizing protein three‐dimensional structure.…”
Section: Discussionsupporting
confidence: 65%
“…A few make use of three‐dimensional structure information, particularly to infer any thermodynamic destabilization of the structure (Yue, Li, & Moult, ; Redler, Das, Diaz, & Dokholyan, ), assuming that decreased protein activity implies a relationship to disease. Some methods combine both sequence and structure information (Adzhubei et al., ; Baugh et al., ; Carter, Douville, Stenson, Cooper, & Karchin, ; Folkman, Stantic, Sattar, & Zhou, ; Hecht, Bromberg, & Rost, ; Li et al., ). Methods usually use supervised machine learning such as random forest (Carter et al., ; Li et al., ; Niroula et al., ), neural network (Hecht et al., ), and support vector machines (SVMs) (Calabrese et al., ; Kircher et al., ; Yue & Moult, ), or models that do not need training (Choi et al., ; Chun & Fay, ; Lichtarge et al., ; Ng & Henikoff, ; Thomas et al., ).…”
Section: Introductionmentioning
confidence: 99%
“…Variants were classified as deleterious (potentially pathogenic), benign, or of unknown clinical significance, by the algorithms SIFT, PolyPhen‐2, Mutation Taster, Mutation Assessor (implemented in the dbNSFP database), and VIPUR (Baugh et al. ). Deleterious mutations and variants of unknown clinical significance were classified as related or unrelated to the phenotype, and as recessive mutations, or mutations with no known disease associations, using the American College of Medical Genetics and Genomics (ACMG) guidelines.…”
Section: Methodsmentioning
confidence: 99%
“…Evaluating the structural impact of a mutation, and the associated change in the Gibbs free energy of protein folding (ΔΔG), can assist in predicting the deleteriousness of a mutation (Glusman et al, , p. 113), can offer a mechanism explaining how a particular mutation produces a particular phenotype (Nielsen et al, ), and could potentially guide the selection of treatment strategies and the development of targeted therapeutics to combat mutation effects (Albanaz, Rodrigues, Pires, & Ascher, ). While many tools exist for predicting the ΔΔG of mutations (Barlow et al, ; Baugh et al, ; Capriotti, Fariselli, & Casadio, ; Dehouck, Kwasigroch, Gilis, & Rooman, ; Kellogg, Leaver‐Fay, & Baker, ; Park et al, ; Pires, Ascher, & Blundell, ; Schymkowitz et al, ), the accuracy of those tools is difficult to ascertain. Most of the tools have been trained and validated on the same data set of experimentally measured ΔΔG values (Bava, Gromiha, Uedaira, Kitajima, & Sarai, ), and while they generally report good accuracies on that data set, the results are more varied when it comes to new mutations that had not been evaluated previously (Buß, Rudat, & Ochsenreither, ; Geng, Xue, Roel‐Touris, & Bonvin, ; Khan & Vihinen, ; Kroncke et al, ; Potapov, Cohen, & Schreiber, ).…”
Section: Introductionmentioning
confidence: 99%
“…ELASPIC is a meta-predictor, developed by our lab, which uses the gradient-boosted decision tree algorithm (Friedman, 2002) to integrate predictions made by Provean, empirical energy terms calculated using FoldX, as well as other features, to predict the ΔΔG of mutations. ELASPIC falls in the category of methods which use both sequence and structural information to predict the ΔΔG of mutations, with examples of other methods in this category being DUET (Pires, Ascher, & Blundell, 2014a), VIPUR (Baugh et al, 2016), and STRUM (Quan, Lv, & Zhang, 2016). In every case, sequence and structural features are integrated using machine learning algorithms trained on datasets of experimentally-measured ΔΔG values (Bava et al, 2004;Moal & Fernández-Recio, 2012).…”
Section: Introductionmentioning
confidence: 99%