2019
DOI: 10.1101/855957
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations

Abstract: In order to deal with the huge number of novel protein-coding variants being identified by genome and exome sequencing studies, many computational phenotype predictors have been developed. Unfortunately, such predictors are often trained and evaluated on different protein variant datasets, making a direct comparison between predictors very difficult. Moreover, training and testing datasets may also overlap, introducing training bias. In this study, we use 29 previously published deep mutational scanning (DMS) … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

11
106
1
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 51 publications
(119 citation statements)
references
References 100 publications
11
106
1
1
Order By: Relevance
“…1c, gray distributions). Another recent analysis also found a method heavily relying on evolutionary information as one of the best performers on DMS data, although more sophisticated than our naïve approach [48,76].…”
Section: Family Conservation Carries Most Important Signalmentioning
confidence: 78%
See 1 more Smart Citation
“…1c, gray distributions). Another recent analysis also found a method heavily relying on evolutionary information as one of the best performers on DMS data, although more sophisticated than our naïve approach [48,76].…”
Section: Family Conservation Carries Most Important Signalmentioning
confidence: 78%
“…DMS datasets constitute a uniquely valuable resource for the evaluation of current SAV effect prediction methods [17,47,48], not the least, because most have not used those data. The Fowler lab has, recently, published an excellent analysis of prediction methods on DMS datasets and developed a new regression-based prediction method, Envision, trained only on DMS data [49].…”
Section: Introductionmentioning
confidence: 99%
“…Although the bioinformatic predictions were modestly correlated with our experimental measurements for variants in the functional validation set, they were markedly less concordant for other variants at those same residues, or throughout MSH2 at large ( Figure 4B). Similarly weak overall agreement has also been observed when benchmarking bioinformatic classifiers with deep mutational scans of other genes [61][62][63] . As variant effect predictors are often trained on the limited number of known variants with available classifications, their divergence with our experimental measurements may reflect overfitting, further suggesting that the comparison to a small set of functionally characterized alleles overestimates bioinformatic predictors' performance.…”
Section: Loss Of Function Scores Outperform Bioinformatic Predictorsmentioning
confidence: 82%
“…Importantly, we excluded any predictors trained using supervised learning techniques, as well as meta-predictors that utilise the outputs of other predictors, thus including only predictors we labelled as unsupervised and empirical in our recent study 10 . This is due to the fact that predictors based upon supervised learning are likely to have been directly trained on some of the same mutations used in our evaluation dataset, making a fair comparison impossible 10,50 . A few predictors perform substantially better than FoldX, with the best performance seen for SIFT4G 51 , a modified version of the SIFT algorithm 52 .…”
Section: Resultsmentioning
confidence: 99%
“…Although different approaches vary in their implementation, a few types of information are most commonly used, including evolutionary conservation, changes in physiochemical properties of amino acids, biological function, known disease association and protein structure 7 . While these predictors are clearly useful for variant prioritisation, and show a statistically significant ability to distinguish known pathogenic from benign variants, they still make many incorrect predictions [8][9][10] , and the extent to which we can rely on them for diagnosis remains limited 11 .…”
Section: Introductionmentioning
confidence: 99%